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Introduction 


Until recently, the deterministic view of Nature prevailed in Physics. 
Starting from the Enlightenment, there was the belief that Mechanics was 
completely described by Newton’s laws. If the initial conditions of a 
mechanical systems were known, then its state could be completely 
determined in the future as well as in the past. This was precisely the 
viewpoint of Lagrange and it ruled from the 18th Century to around the 
middle of the 20th Century. 

However this panglossian view of Nature was disputed, especially with the 
advent of Thermodynamics that challenged the belief in reversibility and 
started the study of complex random systems. In addition, the discovery of 
simple systems with chaotic behavior, that despite being deterministic, 
showed a very complex behavior, that the mechanical problems arc far from 
simple. Quantum Theory, however, showed that the processes in Nature are 
not deterministic, but stochastic. This change in paradigm is not yet accepted, 
as is well described in the book of Thomas Kuhn that discusses sociological 
aspects of sciences. 

The main ideas of probability were slow to develop. Perhaps, because 
chance events were interpreted as the wish of the gods and hence were not 
believed to be random. The first theoretical works on probability were 
connected with games of chance and since the set of possibilities are finite, 
the studies arc combinatorial, but the big breakthrough being when the 
continuous case is studied. 
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Stochastic modeling is, of course, harder than deterministic modeling and 
the implementation of the model is more costly. Let us look at this in a simple 
example. In the simplest deterministic continuous model, the parameters of 
the model arc constants. The equivalent of this in the stochastic case is to 
transform the constants in random variables. For each realization of the 
random variable, its value is also a constant, but the constant may change with 
the realization following a certain distribution of probability. So, the most 
simple object from a stochastic model - random variables - is formed of 
functions, hence objects of infinite dimension. Also, in a coherent 
deterministic model, we expect a solution if we fix suitable conditions. In a 
stochastic model this is not the case. For each realization of the stochastic 
data, the parameters arc chosen and a deterministic problem is solved. Then, 
with the set of solutions obtained, statistics arc made and the main result of 
the problem is the distribution of probability of the results. Thus, the main 
element to be determined is the distribution of the possible solutions. The 
values obtained from a single realization arc of no importance, the 
distribution of the values is the important result. The understanding of this 
fact came very slowly for the manufacturing industry, for example, and only 
after long years of application of quality control in the manufacturing came 
the realization that they were dealing with a process stochastic in nature. 
Since then, stochastic modeling has been essential in Engineering. Today, 
reliability represents a new way of design and it takes into account the 
inevitable uncertainties of the processes. 

The solution of a stochastic model can be decomposed into three steps: 
generation of the stochastic data following their distribution of probability, 
solution of a deterministic problem for each sample generated, and finally 
computation of statistics with the results (for example, to construct a 
histogram) until they show a certain persistence (do not change an error 
criterion accordingly). Histograms represent the approximation of the 
solution of a stochastic problem. When we want an approximation of the 
solution of a stochastic problem, a histogram is the best approximation. If 
they arc hard to find, or costly to compute, we make do with mean and 
dispersion, or a certain number of moments. In some situations, such as the 
ones described in Chapter 8, we compute only a particular probability since 
computation of moments is too expensive. 

This book presents the main ideas of Stochastic Modeling and Uncertainty 
Quantification using Functional Analysis as the main tool. More specifically. 
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Hilbert Spaces and orthogonal projections arc the basic objects leading to the 
methods exposed. As presented in the text, some ideas often considered as 
complex, such as Conditional Expectations, may be developed in a systematic 
way by considering their definition as orthogonal projections in convenient 
Hilbert spaces. 

Chapter 1 discusses, the main ideas of Probability Theory in a Hilbert 
context. This chapter is designed as a reference, but the main concepts of 
random variables and random processes arc developed. The reader, having a 
previous knowledge of probability theory and orthogonal projections, may 
start reading from Chapter 2, which presents the construction of a stochastic 
model by using the Principle of the Maximal Entropy. When the data is 
formed by independent random variables, the generation of a model by this 
procedure may be considered as simple. But in the case where dependent 
random variables or a random process has to be considered, the generation 
difficulties appeal - and new tools have to be introduced - the main ideas 
concerning the generation of samples from random vectors and processes are 
discussed in this chapter. Chapter 3 presents the problem of approximation of 
a random variable by a function of another random variable and the general 
methods which may be used in order to numerically determine this 
approximation. The following chapters deal with applications of these 
methods in order to solve particular problems. Chapter 4 considers linear 
systems with uncertainties; Chapter 5 presents methods for nonlinear systems 
with uncertainties; Chapter 6 deals with differential equations and the next 
two chapters, i.e. Chapters 7 and 8 present methods for optimization under 
uncertainties. 

In order to help the reader interested in practical applications, listings of 
Matlab® programs implementing the main methods are included in the text. 
In each chapter, the reader will find these listings which complete the 
methodological presentation by a practical implementation. Of course, these 
programs are to be considered as examples of implementation and not as 
optimized ones. The reader may construct their own implementations - we 
expect that this work will be facilitated by the examples of implementation 
given in the text. 

The authors would like thank the ISTE for the reception of this project and 
the publication of this work. 
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Elements of Probability Theory 
and Stochastic Processes 


The element which constitutes the foundation of the construction of 
stochastic algorithms is the concept of random variable , i.e. a function 
X : Q — 7- M for which individual values X (u) ( u € f?) are not available or 
simply not interesting and we arc looking for global information connected to 
X. 


For instance, we may consider fl as the population of a geographic region 
(country, town, etc) and numerical quantities connected to each individual ut: 
age, distance or transportation time from a residence to work, level of studies, 
revenue in the past year, etc. Each of these characteristics may be considered 
as deterministic, since being perfectly determined for a given individual u. 
But to obtain the global information for all the individuals may become 
expensive (recall the cost of a census) and errors may occur in the process. In 
addition, maybe we are not interested in a particular individual, but only in 
groups of individuals or in global quantities such as “how many individuals 
are more than 60 years old?”, “what is the fraction of individuals needing 
more than one hour of transportation time?”, “how many individuals have 
finished university”, “how many families have an income lower than ... ?”. In 
this case, we may look to the quantities under consideration as random 
variables. 

These examples show that random variables may be obtained by 
considering numerical characteristics of finite populations. This kind of 
variable is introduced in section 1.2, and gives a comprehensive introduction 
to random variables and illustrates their practical use, namely for the 
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numerical calculation of statistics. In the general situation, random variables 
may be defined on general abstract sets fl by using the notions of measure 
and probability (see section 1.7). 

1.1. Notation 

Let us denote by N the set of the natural numbers, and by N* the set of the 
strictly positive real numbers. M denotes the set of the real numbers 
(— oo, +oo) and the notation M e refers to the extended real numbers: 
MU {— oo, + oo}. (a, b) = {x € M: a < x < b } denotes an open interval of 
real numbers. 

For k € N*, M fc = {x = (x\, ...,£&) : x n € M,1 < n < k} is the set of 

the ^-tuples formed by real numbers. Analogous notation is used for Mg. We 
denote by |«| the standard p-norm of M fc . The standard scalar product of two 
elements x and y of M /,: is: 

k 

(*> y)k = XiVi - 

i = 1 

When the context is clear enough in order to avoid any confusion, the scalar 
product on M fc is simply denote by a point: 

x.y = (x,y) k . 

We will use analogous notation for matrices: A = (Ajj) 1<i<m 1<J<n 
denotes an rn x n-matrix formed by real numbers. The set formed by all the 
rn x n-matriccs formed by real numbers is denoted by M (rn, n ). Usually, the 
elements of M />: may be seen either as elements of M (k, 1) (column vectors) 
or elements of fM (1, k) (row vectors). In this text, we consider the elements of 
M fc as column vectors, i.e. elements of M 1 ). This allow us to write the 
product of an m x k matrix A U Af (m, k) and an element of M fc , considered 
as an element of x € M fc = CM ( k , 1): 

y = Ax € ‘M (m, 1) = M m . 

As usual, we have y = (y\, with 

k 

Vi = AjjXj. 

3 = 1 
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Sometimes, in order to simplify the expressions, we use the convention on 
the implicit sum on repeated indexes (called Einstein’s convention) by simply 
writing //, = Aj^xy in this case. 

1.2. Numerical characteristics of finite populations 

Counts or surveys of the numerical characteristics of finite populations are 
often generated by censuses, generally made at regular time intervals. For 
instance, the number of inhabitants, the family income, the type of the house 
or employment, the school or professional level of the members of the family 
arc some examples of characteristics that arc periodically verified. 

From the mathematical standpoint, a numerical characteristic A defined 
on a finite population ft is a numerical function X : — y M. As previously 
observed, an important feature in censuses is that, usually, we arc not interested 
in the particular value X (t o) of the numerical characteristic for a particular 
individual w € tl, but in global quantities connected to X, i.e. in the global 
behavior of X on the population (>: what the fraction is of the population 
having an inferior or superior age to given bounds, what part of the population 
has revenues inferior or superior to given bounds, etc. Thus, a frequent 
framework is the following: 

1) ft = {u 1 , ..., te^} is a finite population, non-empty set (N > 1) 

2) X : ft — y M is a numerical characteristic defined on Q, having as an 
image a set of k distinct values: X (fi) = { A] , .... Xp . }, A* f A j for i f j. 
Since these values arc distinct, they may be arranged in a strictly crescent 
order: if necessary, we may assume that A \ < A ) for i < j without loss of 
generality. 

3) The inverse image of the value Aj € A (Q) is the subpopulation 
Hi = A -1 ({A*}) = {oj G 12 : A (cc) = Aj}. The number of elements 
of Hi is -jt (Hi) = m and the relative frequency or probability of Aj is 
P (A = Aj) = Pi = m/N. 

4) Let A C M: we define P (A) = tia/N, where ha = # (A -1 (A)) is 
the number of elements of A. 

We have: 

k k 

Pi = 1 and rij = N. 
i = 1 i = 1 
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In addition, since the number of elements of Hi C 31 does not exceed the 
number of elements N of the global population 31, we have n, < TV. Thus, 

0 < Pi < 1 . 

Definition 1.1 (mean on a finite population).- The mean, or expectation, of 
X is: 


1 N 

E{X) = -Y J X(.u n ).m 

n — 1 

Proposition 1.1.- The mean is a linear operation (i.e. E{*) is linear): let 
X : ft — y M and Y : 31 — y M be two numerical characteristics of the 
population 31 and a, 3 be two real numbers. Then: 

E (aX + 3Y) = aE (X) + 3E (Y) . u 

Proof.- We have: 

1 N 

E (aX + /3Y) = -J2 ( aX (^) + /5 y (^)) • 

n = 1 


that shows that: 

N B N 

E(aX + 3Y) = ^^X(u; n ) + E-^2 Y (u; n )=aE( x ) + 3E(Y). • 

n = 1 n = 1 

Proposition 1.2.- If X > 0 then E ( X ) > 0. ■ 

Proof.- Since X > 0, we have X (t o) > 0,Vw G 31. Thus E (X) is a sum of 
non-negative terms and we have E (X) > 0. ■ 

Definition 1.2 (variance on a finite population).- The variance of X is: 

1 N 

v ( x )=vE( x M-£(x» 2 . 

n = 1 

The standard deviation of X is the square root of the variance: a(X) = 
y/V(X).m 
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Proposition 1.3.- We have: 

V(X) = E(X 2 ) - [E(X)} 2 . m 
Proof.- Since: 

(X ( uj n ) - E (X)) 2 = (x (w n ) 2 ) - 2 E {X) X (u n ) + [E (X)} 2 , 
we have: 

r tv N N 

''(•V) - £ (X K) 2 ) - 2 E (X) J2 X (u n ) + [E (X)} 2 J2 1 

Lti = 1 n = 1 n = 1 

and 

V(X) = E (X 2 ) -2 [E (X)] 2 + [E {X)] 2 = E (X 2 ) - [E (X)] 2 . ■ 

Proposition 1.4.- V (X) > 0. In addition, V (X) = 0 if and only if X is a 
constant on Q. ■ 

PROOF.- The first assertion is immediate, since V (X) is a sum of 
non-negative terms (sum of squares). For the second, we observe that a sum 
of squares is null if and only if each term in the sum is null. Thus, V (X) = 0 
if and only if X (tu n ) = E (X), Vn€ N. ■ 

Corollary 1.1 - E (X 2 ) > 0 and E (X 2 ) = 0 if and only if X = 0. ■ 

Proof.- We have E (X 2 ) = V (X) + [E (X)] 2 . Thus, on the one hand, 
E (X 2 ) > 0 and, on the other hand, E (X 2 ) = 0 if and only if V (X) = 0 
and E (X) = 0. ■ 

Definition 1.3 (moments on a finite population).- The moment of order p, 
or p-moment, of X is: 

1 N 

M p (X) = E (X p ) = -£(X {u n )f. m 

n = 1 

The whole body of information on the global behavior of a numerical 
characteristic X may be synthesized by a frequency table, which is a table 
giving the possible values of X and their relative frequencies: 
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Xi 

x 2 


x k 

m 

n 2 


rik 


X x 

x 2 


X k 

Pi 

P2 


Pk 


Table 1.1. Synthesizing information on X in a frequency table 


Frequency Tables do not contain any information about the value of X for 
a particular - u> € 17, but only global information about X on 17. Nevertheless, 
they permit the evaluation of global quantities involving X. For instance, 

Lemma 1.1.- Let g : M — > M be a function and Z = g ( X ). Then: 

k 

E(Z) = Y J Pi9{X i ).w [1.1] 

i = 1 

Proof.- Since |J- = 1 Hi = 17, we have: 


E ( Z ) = X Ya= 1 I Ecu e Hi 

\ = .'/(-Vi) on Hi 

Jf Ei = t n i 9 (Xi) = Ylt= I Pi9 (Xi) . ■ 

Proposition 1.5.- M p (X) = Y^= I PiXf. Namely, E (X) = J2i = i P%X % 

andE(X 2 )=Ef = iP^. ■ 

PROOF.- The result follows straight from the preceding lemma. ■ 

The information contained in a frequency table may also be given by a 
cumulative distribution function - often referred to as cumulative function, or 
distribution function : 

F(x) = P(X <x) = P ( X € ( — oo, x)) . 


X 

x < Xi 

H 

A 

x < X 3 


x < Xk 

X < + oo 

F(x) 

0 

Pi 

Pi +P2 


pi + ... +Pk - 1 

1 



Table 1.2. Synthesizing information on X in a cumulative distribution function 
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We observe that F admits a distributional derivative (i.e. a weak derivative) 
and that f = F' is a sum of k Dirac measures (i.e. Dirac masses): f (x) = 
Ylt = iPid ( x — Xi). f is the probability density of X: in this situation, / is 
a distribution - we consider later situations where this probability density is a 
standard function. 


1.3. Matlab implementation 
1.3.1. Using a data vector 

Matlab furnishes simple commands which evaluate statistics from data 
vectors. For instance, if the data is given by a vector X = (X \, ..., Xn), 
where X n = X ( u n ) , 1 < n < N, you may use one or several of the 
following commands: 

Listing 1.1. Statistics of a data vector 

ma = sum(X) / length (X) ; % mean of vector X 
mb = mean(X) ; % mean of vector X 

via = sum((X — ma) . A 2) /length (X) ; % variance of vector X 
vlb = var(X.l); % variance of vector X 

v2 = sum((X— ma) . A 2 )/( length (X) — 1) ; % unbiased estimative of 
the variance of vector X 

vie = var(X); % unbiased estimative of the variance of vector X 
sla = sqrt(vla); % standard deviation of vector X 
sib = std(X, 1); % standard deviation of vector X 
sic = std(X); % unbiased estimative of the standard deviation 
of vector X 

s2 = sqrt(v2); % unbiased estimation of the standard deviation 
of vector X 

m_p = mean(X. A p) ; % moment of order p of vector X 
mc_p = mean((X — ma). A p); % centered moment of order p of 
vector X 
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The cumulative distribution may be obtained by using: 

Listing 1.2. Cdfofa data vector 

function [F_Y] = sample2cfd (ys , y ) 

% 

% determines the empirical cumulative function associated to a 
sample 

% ys from a random variable y. F_Y is determined at the 
a bsciss a y . 


% IN: 

% ys : sample of y — type array of double 

% y: vector of the abscissa for evaluation of F_Y — type array 
of double 

% 

% SORTIE: 

% F_Y: vector containing the values of the cfd — type array of 
double 

% F_Y( i ) = P( Y < y ( i ) ) is the cfd at y( i ) 

% 

F_Y = zeros ( size (y )) ; 
ns = max ( s i z e ( y s ) ) ; 
n = max (size(y)); 
for i = 1 : n 
ya = y ( i ) ; 

ind = find(ys < ya) ; 
if isempty(ind) 
na = 0; 

else 

na = max( size ( ind )) ; 

end ; 

pa = na / ns ; 

F_Y ( i ) = pa; 

end ; 

return ; 
end 


The probability density associated may be determined by numerical 
derivation of the cumulative density 
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Listing 1.3. Pdf of a data vector 

function fY = cdf2pdf(FY,y,h) 

% 

% determines the density function associated to the cumulative 
fu ncti o n 

% FY evaluated at the equally spaced abscissa y. 

% f_Y is determined by particle derivative. 

% 

% IN: 

% FY: values of the cdf — type array of double 
% y: vector of the abscissa — type array of double 
% li: radius of influence — type double 

% defines how many neighbour points are to derivate 

% for equally spaced data, h = a small (1 to 5) multiple of 

the step 

% often 3 is used 

% the larger is h, the smoother is the pdf. 

% 

% OUT: 

% fY: vector containing the values of the pdf — type array of 
double 

% 

g = @(y,x) exp ( — 0.5 *((y— x ) /h) A 2) /( h* sqrt (2* pi ) ) ; 
dg = @(y,x) — ((y— x )/h)*(l/h)*g(y,x); 

% 

ny = length (y ) ; 
v = zerosCsize(y)); 
for i = 1 : ny 
si = 0; 
for j = 1 : ny 

aux = g ( y ( j ) , y ( i ) ) ; 
si = si + aux ; 

end ; 

v ( i ) = si; 

end ; 

fY = zeros ( size (FY) ) ; 
for i = 1 : ny 
si = 0; 
s2 = 0; 
for j = 1 : ny 

aux = dg(y(j),y(i))/v(j); 
si = si + aux*(y(j) — y ( i ) ) ; 
s2 = s2 + aux *(FY ( j ) - FY ( i ) ) ; 

end ; 

fY ( i ) = s2/sl ; 

end ; 

return ; 
end 
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An example is shown below (Figure 1.1): we generate a random sample of 
250 variates from the Gaussian distribution N (0, 1) and we determine both the 
cdf and the pdf by using these programs 




Figure 1.1. Results for a sample of the Gaussian distribution. For a color 
version of the figure, see www.iste.co.uk/souzadecursi/quantification.zip 


1.3.2. Using a frequency table 

Assuming that the data is summarized in two vectors X = (X \. ..., X/.), 
and nX = (ri \ ..... nf), you may use one or several of the following 
commands: 

Listing 1.4. Statistics of a frequency table 

ma = sumfnX. *X) /sum(nX) ; % mean of vector X 
via = sum(nX.*(X — ma) . A 2 ) /sum(nX) ; % variance of vector X 
v2 = sum(nX.*(X— ma) . A 2 ) / ( sum(nX) — 1) ; % unbiased estimative of 
the variance of vector X 

sla = sqrt(vla); % standard deviation of vector X 
s2 = sqrt(v2); % unbiased estimation of the standard deviation 
of vector X 

m_p = meanfnX . * (X. A p ) ) ; % moment of order p of vector X 
mc_p = meanfnX . * (X — ma). A p); % centered moment of order p of 
vector X 
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Notice that the relative frequencies may be obtained as follows: 

Listing 1.5. Relative frequencies from the absolute ones 
function pys = abs2relfreq ( nys ) 

% 

% generates the relative frequencies pys from the 
% absolute frequencies nys 
% 

%IN : 

% nys : table of absolute frequencies ( numbers of occurences ): 
type array of integer 

% 

%OUT: 

% pys : table of relative frequencies : type array of double 
% 

ns = sum( nys ) ; 
pys = ny s / ns ; 

return ; 
end 


Analogous to the preceding situation, the probability density associated 
may be determined by numerical derivation of the cumulative density. 


1.4. Couples of numerical characteristics 

Let us consider a pair ( X , Y) of numerical characteristics X,Y : ft — > M 
such that X (ft) = {Xy , ..., X k } and Y (ft) = {Y u ..., Y m }. Since X and Y 
are connected by w, we refer to ( X , Y) as a couple of numerical characteristics 
on ft. 

Analogous to the preceding situation where a single numerical 
characteristic has been considered, the inverse image of ( . Yj ) is H L) = 
X- 1 {{x t }) n y- 1 {{Yj}) = {w G ft : a: (w) = Xi and Y (w) = Yj}. The 
number of elements of H, :I is # ( H VJ ) = mj and the relative frequency (i.e. 
the probability) of ( X,,Yj ) is P (X = X,,Y = Yj) = ptj = n,j /N. Here, 

km km 

Pij = 1 and ^ riij = N. 
i = ij = i i ij i 

The information about the global behavior of the couple ( X , Y) may be 
synthetized in a contingency table (or cross-tabulation table). 
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Vi 


y m 

AT 

pn 


Plm 









A fe 

Pfci 


Vkm 


Table 1.3. Synthesizing information on ( X , Y) in a contingency table 


Analogous to frequency tables, contingency tables permit the evaluation of 
global quantities involving functions of the couple: 

Lemma 1.2.- Let g : M 2 — > M be a function and Z = g ( X , Y ). Then: 

k m 

E(Z) = J2J2PiJ9(Xi,Y j ).m [1.2] 

* = ij = i 


Proof.- Since |J-' = 1 (J'" = 1 = O, we have: 



so that: 

N km 

E 9 (■ X (u n ) , Y (cP 1 )) = E E ^ ■ ■ 

n = 1 i = 1 j = 1 


Thus, 


X " 1 ({X,}) = H,. = U y - 1 ({y,}) = H.j = U H tj , 

j= 1 i = 1 


and we have: 


p (x = x 4 ) = Pim = e Ay; p (y = y,) = p.j = E *>«• 

j = i * = t 
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As a consequence: 

k km 

i = 1 i = 1 j = 1 

m km 

J = 1 i = lj = l 

Into an analogous way, 

k km 

v{x) = Y J Pi . ( a i - e (A )) 2 = E E pa ^ - E (*)) 2 ; 

i = 1 i = 1 j = 1 

m km 

V(Y)= E P-i - E (Y)) 2 = E E ^ ^ ( y )) 2 ; 

J = 1 i = lj = l 

k km 

m p (X) = V B.xf = ]T £ 

i = 1 i = 1 j = 1 

m km 

M r (Y) = Y.P'i Y ! = Y.Y^ 7- 

j = 1 t = 1 j = 1 

When considering couples, the cumulative distribution function is: 

F (x, y) = P (X < x, Y < y) = P ((A, Y) € (— oo, x) x (— oo, y)) 

and the probability density is: 

d 2 F 

f(x ' y)= My ix ' yh 

Definition 1.4 (covariance of a finite population).- The covariance of A and 
Y is: 

1 N 

Cov (A, Y) = - E (X (<*>') - E (A)) (Y (a;*) - E (Y)) . I 

n = 1 

Proposition 1.6.- We have: 


Cov (A, Y) = E (AY) -E(X)E (Y) . u 
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Proof.- Since: 

(X ( u n ) - E (X)) (Y (u n ) - E (Y)) = X (u n ) Y ( uj n ) - E (X) Y (u n ) 
-E (Y) X (cu n ) + E{X)E (Y ) , 

we have: 


Cov(X, Y) = — 


Y J X{u n )Y{u n )-E{X)Y J Y^ 

' 1=1 n = 1 

N ~ 

E(Y)Y J X{u n ) + E{X)E{Y)Y J 1 


and: 

Cov {X, Y) = E (XY) — E (X) E (Y) - E (Y) E (X) + E (X) E (Y) = 
E (XY) — E (X) E (Y ) . m 

Proposition 1.7.- We have: 


V (aX + PY) = a 2 V (X) + (3 2 V (Y) + 2 af3Cov (X, Y) . u 


Proof.- Since: 

(aX + PY f = a 2 X 2 + /3 2 Y 2 + 2 apXY, 


we have: 

E [{aX + /?Y) 2 ) = o?E (X 2 ) + P 2 E ( Y 2 ) + 2 a/3E ( XY ) . 

In addition. 


E (aX + PY) = aE {X) + fiE (Y ) , 

so that: 

E [(aX + PY)} 2 = a 2 [E (X)] 2 + p 2 [E (Y)] 2 + 2 afiE (X) E (Y) . 

and: 

V (aX + PY) = a 2 (e (X 2 ) - [E (X)] 2 ) + /3 2 (e (Y 2 ) - [E (Y)] 2 ) 
+2 ap (E (XY) - E (X) E (Y)) , 


which concludes the proof. ■ 
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Corollary 1.2.- Let X : 12 — > M and Y : 12 — > M be two numerical 
characteristics on 12. Then | Cov ( X , Y)| < \JV (X)^JV (Y). m 

PROOF.- Let a € R. Let us consider the second degree polynomial / (ct) = 
a 2 V (X) + 2 aCov ( X , Y) + V (Y). From the preceding proposition: / (ct) = 
V (aX + Y) > 0, Vet G M. Thus, 

A = [2 Cov (X, Y)] 2 - 4V (X) V (Y) < 0, 

so that: 

[Cov {X,Y)f < V (X) V (Y) 

and we have the result. ■ 


1.5. Matlab implementation 

Assuming that the data is summarized in two vectors X = (X \- ■■■■ Xf.), 
Y = (Yi, and the absolute number of occurrences is given by 

nXY = ( riij : 1 < i < k,l < j < m), we obtain the contingency table by 
using: 

Listing 1.6. Generating the contingency table 

function pXY = abs2relfreq (nXY) 

% 

% generates the relative frequencies pXY from the 
% absolute frequencies nXY 
% 

%IN: 

% nXY : table of absolute frequencies (numbers of occurences ) : 
type integer 

% 

%OUT: 

% pXY : table of relative frequencies : type double 
% 

ns = sum(sum(nXY) ) ; 
pXY = nXY/ns; 

return ; 
end 


The marginal distributions may be obtained as follows (for instance, [pX, 
pY] = cont2marg(pXY, size (X) , size(Y));): 
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Listing 1.7. Generation of marginal values 

function [cX, cY] = cont2marg (cXY. sizeX , size Y ) 

% 

% generates the marginal values for each variable 
% from the contingency table of the pair 
% 

%IN: 

% cXY : contingency table of the pair : type array of integer 
or double 

% sizeX : dimensions of X: type integer 
% sizeY : dimensions of Y 
% 

%OUT: 

% cX : table of marginal values for X: same type as cXY 
% cY : table of marginal values for Y: same type as cXY 
% 

cX = reshape (sumtcXY, 2), sizeX); 
cY = reshape (sumtcXY, 1), sizeY); 

return ; 
end 


and these results may be used in order to obtain statistics from X or Y by 
using the Matlab programs of section 1.3. The cdf of the pair may be obtained 
by using (the results may be visualized by using surf (x , y , FXY ' ) ; ) 

Listing 1.8. Cfd of a contingency table 
function FXY = cont2cdf (X, Y,pXY. x , y ) 

% 

% determines the cdf of the pair (X,Y) from the 
% contingency table pXY giving the relative frequencies 
% 

% IN: 

% X: values of X — type array of double 
% Y: values of Y — type array of double 

% pXY : relative frequencies of the pair — type array of double 
% PXY(i,j) = frequency of (X(i) ,Y(j )) 

% x: abscissa for the calculation of the cdf — type array of 

double 

% y: abscissa for the calculation of the cdf — type array of 

double 

% 

% OUT: 

% FXY : table containg the values of the cdf — type array of 

double 

% FXY ( i , j ) = P( X < x(i) , Y < y ( j ) ) 
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% 

nx = length (x ) ; 
ny = length (y ) ; 

FXY = zeros ( nx , ny ) ; 
for i = 1 : nx 

indx = find(X < x(i)); 
if isempty ( indx ) 

FXY( i , : ) = 0; 

else 

for j = 1 : ny 

indy = find (Y < y(j )) ; 
if isempty ( indy ) 

FXY ( i , j ) = 0; 

else 

subpXY = pXY ( indX , indY ) ; 

FXY ( i , j ) = sum ( sum ( subpXY ) ) ; 

end ; 

end ; 

end ; 

end ; 
return ; 
end 


The statistics of the table may be obtained from the relative frequencies 

pXY = ( Pij : 1 < i < k, 1 < j < m) as follows: 

Listing 1.9. Statistics 

function [covXY,mX,mY,vX,vY] = c o n 1 2 s t at (pXY, X. Y) 


% 

generates 

the 

statistics 

of the c 

on t in gen 

cy t 

able 

% 










% 

IN: 









% 

X: va 

lues 

of X 

— type array of do 

uble 



% 

Y: va 

lues 

of 

— type array of do 

uble 



% 

pXY 

re l 

a five 

freq u encie 

s of 

th e 

pair — 

type 

array 

% 


PXY( i,j) 

= frequency of 

(X( 

i),Y(j )) 



% 










% 

OUT: 









% 

covXY 

: the co 

variance of 

the 

pai 

- — type 

dou 

ble 

% 

mY : 

the 

mean 

of Y - type 

dou 

ble 




% 

mX : 

the 

mean 

of X - type 

dou 

ble 




% 

mY : 

the 

mean 

of Y - type 

dou 

ble 




% 

vX : 

the 

v aria 

tee of X — 

type 

dou 

ble 



% 

vY : 

the 

v aria 

ice of Y — 

type 

dou 

ble 




% 


of double 


XX = reshape (X, [ 1 , length (X) ] ) ; 
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YY = reshape(Y,[length(Y),l]); 

mX = sum(XX*pXY) ; % mean of vector X 

vX = sum(((XX— mX) . A 2 ) *pXY) ; % variance of vector X 

niY = sum(pXY*YY) ; % mean of vector X 

vY = sum(pXY*(YY — mY). A 2); % variance of vector X 

covXY = XX*pXY *YY - mX*mY; 


In some situations, the contingency is not given, but only the values of the 
pair (X, Y). By assuming that the data is given by a ns x 2 table of values 

XYs = (Xj, Yi ) , 1 < i < ns (one line of XYs contains a value of the pair), 

the statistics may be obtained as follows: 

Listing 1.10. Statistics from a sample 

mX = mean (XYs (: , 1) ) ; % mean of X 

niY = mean (XYs (: ,2)); % mean of Y 

C = cov(XYs,l); % covariance matrix of (X, Y ) 

vX = C ( 1 , 1 ) ; % variance of X 

vY = C(2 ,2) ; % variance of Y 

covXY = C(1 ,2) ; % covariance of X 

CO = cov(XYs); % unbiased estimative of covariance matrix of (X 
, Y) 

vXO = CO (1 , 1 ) ; % unbiased estimative of variance of X 

vYO = C0(2,2); % unbiased estimative of variance of Y 

covXYO = CO ( 1 ,2); % unbiased estimative of covariance of X 


In this case, the cdf of the pair may be obtained by using 

Listing 1.11. Cdf from a sample 

function FXY = sample2cdf (XYs , x , y ) 

% 

% determines the cdf of the pair (X, Y) from the 
% sample values XYs 
% 

% IN: 

% XYs : table ns x 2 of values of the pair — type array of 
double 

% XYs(i,:) contains one pair (X, Y) 

% 

% x: abscissa for the calculation of the cdf — type array of 
double 

% y: abscissa for the calculation of the cdf — type array of 
double 

% 

% OUT: 
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% FXY : table containing the values of the cdf — type array of 
double 

% FXY(i.j) = P( X < x(i) , Y < y(j)> 

% 

nx = length (x ) ; 
ny = length (y ) ; 

FXY = zeros ( nx , ny ) ; 
ns = size (XYs, 1 ) ; 
for i = 1 : nx 
xx = x ( i ) ; 
for j = 1 : ny 

yy = y ( j ) ; 

% s = 0 ; 

% for k = 1: ns 

% if ( XYs (k , 1 ) <= xx && XYs ( k ,2 ) <= yy ) 

% s = s + 1 ; 

% end; 

% end; 

% FXY( i,j) = s; 

ind = find (XYs (: ,1) < xx & XYs(: ,2) < yy) ; 

FXY(i.j) = numel(ind); 

end ; 

end ; 

FXY = FXY/ ns ; 

return ; 
end 


In all the situations, the probability density may be evaluated by using a 
method analogous to the one used in section 1.3: 

Listing 1.12. Pdf from a cdf 

function fXY = cdf2pdf (FXY, x , y , h ) 

% 

% determines the density function associated with the 
c u m ill at i ve functio n 

% FXY evaluated at the abscissa (x,y). 

% fXY is determined by particle derivative. 

% 

% IN: 

% FXY: values of the cdf — type array of double 
% x: vector of the abscissa — type array of double 
% y: vector of the abscissa — type array of double 
% h: radius of influence — type double 

% defines how many neighbour points are to derivate 

% for equally spaced data, h = a small (1 to 5) multiple of 

the step 
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% often 3 is used 

% the larger h is , the smoother the pdf is . 

% 

% OUT: 

% fXY : vector containing the values of the pdf — type 
double 


g = @(y,x) exp ( — 0.5 *((y— x ) /h) A 2) /( h* sqrt (2* pi ) ) ; 
dg = @(y,x) — ((y— x )/h)*(l/h)*g(y,x); 


ny = length (y ) ; 
v = zeros ( size (y ) ) ; 
for i = 1: ny 
si = 0; 
for j = 1 : ny 

aux = g(y ( j ) ,y ( i ) ) ; 
si = si + aux ; 

end ; 

v ( i ) = si; 

end ; 

% 

fX = zeros ( length (x) , length (y) ) ; 
for jjjc = 1: length(y) 
for i = 1: length(x) 
si = 0; 
s2 = 0; 

for j = 1; length(x) 

aux = dg(x( j ) ,x( i ) )*v(j ) ; 

si = si + aux*(x(j) — x(i)); 

s2 = s2 + aux *(FXY( j , j j j c ) — FXY( i , j j j c ) ) 

end ; 

fX(i,jjjc) = s2/sl; 

end ; 

end ; 

% 


fXY 

for 


= zeros ( length (x) . length (y) ) ; 
iiic = 1 : length (x) 
for i = 1: length(y) 

si = 0; 
s2 = 0; 

for j = 1: length(y) 

aux = dg(y( j ) ,y( i ) )*v(j ) ; 

si = si + aux * (y ( j ) - y( i ) ) ; 

s2 = s2 + aux *(fX( iiic ,j) — fX( iiic 

end ; 

fXY ( iiic , i ) = s2/sl ; 

end ; 


i ) ) ; 


array of 
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end ; 

return 

end 


An example is provided below (Figure 1.5): we generate a random sample 
of 10000 variates from a Gaussian vector N( 0, Id) of dimension 2 and we 
determine both the cdf and the pdf by using these programs. 



-5 -5 


Figure 1.2. Results for a sample of the Gaussian vector, with h = 0.3. For a 
color version of the figure, see www.iste.co.uk/souzadecursi/quantification.zip 


1.6. Hilbertian properties of the numerical characteristics 

Let us introduce the set of the numerical characteristics on 12: 

C(fi) = {X : n — > M} . 

The set of the simple numerical characteristics on 12 is: 

V(Q) = {X eC (12) : X (12) is finite} . 

Both C (12) and V (11) are linear spaces (i.e. vector spaces). Since 12 is finite, 
they coincide: we have V (12) = C (12). Later, we will examine more general 
situations where these two sets do not coincide. 


For X, Y € V (12), let us define: 
{X, Y) = E (XY) . 


[1-3] 


22 Uncertainty Quantification and Stochastic Modeling with Matlab® 


Then: 

Lemma 1.3.- (•, •) is a scalar product on V (fl). ■ 

PROOF.- The definition above corresponds to a bilinear symmetric form on 
V(fi). Moreover, (X, X) = E (X 2 ) > 0 for any X € V(fi). Finally, if 
( X , X ) = E (X 2 ) = 0, then X = 0, and the definition corresponds to a 
definite -positive form. ■ 

Let us denote by L 2 (Q) the completion of V (fi) for the scalar product 
given by equation [1.3]: L 2 (fl) is a Hilbert space for the scalar product [1.3]. 
The norm of an element X G L 2 (fi) is: 

||X|| = y/E (X 2 ). [1.4] 

The Hilbertian structure of L 2 (Q) makes possible, the use of results and 
methods from Hilbert space theory. We give particular attention to the concept 
of orthogonal projection on a linear subspace : 

Definition 1.5.- Let us consider a non-null linear subspace S C L 2 (fl) and 
X € L 2 (Q). The orthogonal projection of X onto S is the element PX G S 
such that: 

PX = arg min { ||X — s|| : s € S} . ■ 

We have: 

Proposition 1.8.- If S' is closed then PX exists and is uniquely defined. ■ 
Proof.- see [DE 08]. ■ 

Corollary 1.3.- If S is a finite dimensional linear subspace then PX exists 
and is uniquely determined. ■ 

PROOF.- Since S is finite dimensional, it is closed (see [DE 08]) and the result 
follows from the preceding one. ■ 

Proposition 1.9.- PX is the orthogonal projection of X onto S if and only 
if PX G S and X — PX is orthogonal to S, i.e., 


PX G S and (X - PX, s ) = 0, Vs G S. ■ 
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Proof.- see [DE 08]. ■ 

If we are interested in vectors of numerical characteristics, the elements 
above extend straightly to this situation by considering product spaces. For 
instance, if we are interested in vectors X = (X \, . . . , Xf), we consider 
[V (fl)] p and 


k 


(X, Y ) = E (X.Y) = J2 E 

i=l 


[1-5] 


In this case. 



[ 1 - 6 ] 


All the approaches mentioned afterwards extend to vectors of numerical 
characteristics in this manner. 


1 . 6 . 1 . Conditional probability and approximation 

Let A cU and the following numerical characteristic: 

1 a(w) = 1, if oj G A; 1 a(cu) = 0, if u> ^ A. 

1 4 is the characteristic function of the subset A and we have: 

P(1 a = 1) = P(A), P(1a = 0) = P(Q-A). 

Let us consider a second subset B C fl and denote by 1 /->, its characteristic 
function. We may consider the approximation of 1b in terms of 1 4 : for 
instance, we may look for the coefficient A 6 M such that A1 a is the closest 
possible to 1b- Thus, we may look for the orthogonal projection of 1 /j onto a 
lineal - subspace having dimension 1 and given by: 

S = {Z G L 2 (fl, P) : Z is constant: Z (t 0 ) = si a (u>) € M, Vtu € Al} 

The solution is the orthogonal projection of 1b onto this linear subspace. 
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Assume that P(A) = 0 (i.e. that A is empty): in this case, I 4 is the null 
function, i.e. 1 a(w) = 0, Vcu € Q. Thus, S is formed by the single element 0: 
S = {0} and the orthogonal projection is 0. We have A = 0. Now, assume that 
P(A) > 0, (i.e. that A is non-empty). Recall that the orthogonal projection is: 

(1 b ~ A1a,sU) = 0, Vs € M. 

Thus, by taking s = 1, we have: 

A (1.4, 1a) = (Is, 1a) • 

We have: 


IaAb = 1, if u € A n B\ 1 a ( w ) = 0, if u £ A n B. 


Thus, on the one hand, IaAb = Iadb and, on the other hand 1^.1^ = I. 4 . 
So, we obtain: 


A P{A) = P (An B) =>- A 


P(AnB) 

P(A) 


This value of A is called the probability of B conditional to A or 
conditional probability of B with respect to A and it is denoted by P (B \ A). 
Thus, 


P(B\A) = P ( An B ^ up (A) f 0:P(B\A) = 0 otherwise. [1.7] 
P (A) 

In an analogous way, we have: 

P (A\B) = — - if P ( B ) 0; P (A \B) = 0 otherwise.* 

With these definitions, we have: 

P (A n B) = P (B \A) .P (A) = P(A\B).P ( B ) . [1.8] 

A and B are said to be independent if and only if P (A \ B ) = P (A) or 
P (B | A ) = P(B). If A and B are independent, then P(AnB) = 

P (A) P (B). 
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1 . 6 . 2 . Expectation and approximation of a constant 

When we are looking for the best approximation of X of a constant, we 
may determine the value m G M such that: 

m = arg min {||X — A|| : A £ 1} . 

m is an orthogonal projection onto a linear subspace S having dimension 

1: 

S = {X G L 2 (fl) : Z is constant : Z (co) = A £l, Vcj G fl} 

We have: 

(X - m. A) = 0, VA G M, 
so that: 

A E (X) = Am, VA G R <t=> m = E (X) . 

In addition. 


\\X - m|| = J. E ((X - E (X)) 2 ) = V V {X), 

and the norm of the error of the approximation is the square root of the variance 
of X , i.e. the standard deviation of X. 

For vectors of numerical characteristics, the best approximation of 
X = (Xi, . . . , Xf) by a constant vector is m = E (X) = (mi, . . . , mf), 
where m, = E(Xf). The error in the approximation is ||X — m|| = 

The values of the mean and of the variance of data vectors may be easily 
obtained by using Matlab. By assuming that the data is given by a table 
Xs = ( Xsij : 1 < i < k, , 1 < j < ns), so that each column of Xs contains 
a variate from X, we may use: 
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Listing 1.13. Statistics of a data vector 

mX = mean(Xs,2); % vector of the means of the rows 
vX = var(Xs,l ,2); % vector of the variances of the rows 
vXO = var (Xs , 0 , 2 ) ; % unbiased estimations of the variances of 
the rows 


1 . 6 . 3 . Linear correlation and affine approximation 

Now, let us consider the best approximation of the numerical characteristic 
Y by an affine function of the numerical characteristic X. In this case, we may 
look for the parameters a, b € K such that: 

aX + b = arg min {||Y — Z\\ : Z = aX + /?; a, (5 € R} . 

Here, the solution is also the orthogonal projection onto a linear subspace 
S, having dimension 2 and given by: 

S = {s e L 2 {Q) : s = aX + /3- a, P eR} , 

We have: 

{Y-aX- b, aX + /3) = 0, Va, /3 € M. 

Let us take successively (a, (5) = (1, 0) and (a, /3) = (0, 1) — we obtain: 
aE (X 2 ) + bE (X) = E (XY ) ; aE (X ) + b = E (Y) . 


The solution of this linear system of two variables is: 
Cov ( X , Y) 


V{X) 
In this case, 


b = E{Y)-aE{X). 


\\Y-aX-b\\ = s jv{Y) (l-[p(X,Y)] 2 ), 
where: 

Cov (X, Y) 

WW) V(Y)‘ 


P(X,Y) 
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p (X, Y) is the linear correlation coefficient between X and Y . We have 
| p (X, Y)| < 1 and the error is null if and only if | p (X, Y ) | = 1. 

For vectors of numerical characteristics, the best approximation of 
Y = (Yi, . . . , Yk) by an affine function of X = (Xi, . . . , X m ) is given by 
multilinear regression : Y = AX + B, where A G M(k,m) and 
B G M(k, 1) verify, for 1 < i < k and 1 < r < m, 

m 

AijE (Xj) + Bi = E (Yi) (k equations), 

3 = 1 

m 

A^E (XjX r ) + BiE (X r ) = E (YjX r ) ( km equations). 

j= i 

Thus, A is the solution of the linear system: 

m 

Y AijCov {Xj,X r ) = Cov (Yi, X r ) 

3 = 1 

This system may be solved for a given i by considering 
a = (ui,,... a m )* € lM(m, 1) such that a,j = A,j, C G fM(m, m) such that 
C Y rj = Cov (Xj,X r ) and b = (bi , . . . , 6 m ) t € M(m, 1) such that 
b r = Cov (Yi, X r ). Then: 


C.a = b. 

Once A has been determined, B is obtained from the equation: 
B = Y — A*.E (X) . 


1.6.4. Matlab implementation 

Assuming that the data is given by a ns x 2 table of values 
XYs = (Xj, Yi) ,1 < i < ns (one line of XYs contains a value of the pair), 
the linear correlation coefficient and the coefficients of the linear 
approximation of Y as an affine function of x may be obtained as follows: 

Listing 1.14. Approximation by an affine function - univariate scalar 

mX = mean (XYs ( : , 1 ) ) ; % mean of X 
mY = mean (XYs (: ,2)); % mean of Y 
C = cov(XYs,l); % covariance matrix of (X, Y) 
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vX = C(l,l); % variance of X 

vY = C(2,2); % variance of Y 

covXY = C ( 1 ,2) ; % covariance of X 

a = covXY /vX; % first coefficient 

b = mY — a*mX; % second coefficient 

rhoXY = covXY/ sqrt (vX*vY) ; % linear correlation 

err = sqrt(vY*(l — rhoXY A 2)); % error in the approximation 


An example is given (Figure 1.3): we generate a random sample of 250 
variates from a Gaussian vector N( 0, Id) of dimension 2 and we determine 
the coefficients. Since the variables X and Y arc uncorrelated, the result is a 
constant equal to zero. In order to obtain correlated data, we consider a new 
table XYsort, generated by sorting the columns of XYs in an ascending 
order. In this case, the variables arc correlated and the approximation is better. 



Figure 1.3. Results for a sample of the Gaussian vector. On the left, the 
uncorrelated data XYs leads to a constant. On the right, the results for the 
correlated data XY sort. For a color version of the figure, see 
www. iste. co. uk/souzadecursi/ 'quantification, zip 


Let us consider the situation where Y has to be approximated by an affine 
function of the vector X = (X \, . . . , X m ) . We can say that the data is given by 
an ns x m + 1 table of values XYs = (XY Sij : 1 < i < m + 1, 1 < j < ns), 
so that, for 1 < i < m, each column of XYs contains a variate from X, 
while the column m + 1 contains the values of Y (one line of XYs contains 
a value of the pair (X, Y)). In this case, the linear correlation coefficient and 
the coefficients of the linear approximation of Y by an affine function of X are 
given by the following Matlab program: 
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Listing 1.15. Approximation by an affine function - multivariate scalar 

function [a,b,Ye] = multivar_scalar (XYs,m) 
aux = cov(XYs',l); % covariance matrix of (X, Y) 

C = aux ( 1 :m, 1 :m) ; % covariance matrix of X 
B = aux ( 1 : m,m+ 1 ) ; % second member 
a = C\B; % vector of coefficients 

aux = mean(XYs,2); % vector of the means of the rows 
mX = aux ( 1 :m) ; 
mY = aux(m+l); 

b = mY — a’*mX; % second c o effi c i e nt 
Ye = a ’ *XYs( 1 :m, : ) + b; %estimation 

return ; 
end 


Let us illustrate this approach: we generate X as a random sample of 250 
variates from a Gaussian vector iV(0, Id) of dimension 2 and we consider 
Y = X\ — X 2 + 2 + 0.01 * U, where U is uniformly distributed on (—1, 1). 
The results are shown in Figure 1.4. 


absolute error = 0.0092426, relative error = 0.04 % 


0 data 
0 approximated 



Figure 1.4. Results for the multivariate approximation of a scalar random 
variable. For a color version of the figure, see 
www. iste. co. uk/souzadecursi/ 'quantification, zip 
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The approximation of a vector Y by an affine function of X is made by 
using this method for each component Y t . For instance, we may use the 
following code: 

Listing 1.16. Approximation by an affine function - multivariate scalar 

function [A,B,Ye] = multi var_vec tor (XYs ,m, k) 

Ye = zeros (k , ns ) ; 

A = zeros (m, k) ; 

B = zeros ( k , 1 ) ; 
for i = 1 : k 

XXYY = [XYs ( 1 :m, : ) XYs(m+i,:)]; 

[a,b,YYe] = multivar_scalar (XXYY,m) ; 

A(: , i) = a; 

B ( i ) = b; 

Yeti , : ) = YYe ; 

end ; 

YYe = A’ *XYs ( 1 : m, : ) ; 

for i = 1: k 

YYe(i ,:) = YYe(i ,:) + B(i); 

end ; 
return ; 
end 


An example is given in (Figure 1.5): data X is a random sample of 250 
variates from a Gaussian vector iV(0,Id) of dimension 2 and Y\ = X\ — 
X 2 + 2 + 0.01 * U\, Yy = —2X\ + 3X2 + 1 + 0.01 * U 2 , where U\ and U 2 
are independent, uniformly distributed on (—1, 1). 


absolute error = 0 00045504. relative error = 0.02 % absolute error = 0.00045504. relative error = 0.02 % 



Figure 1.5. Results for the multivariate approximation of a random 
vector. For a color version of the figure, see 
www. iste. co. uk/ souzadecursi/ 'quantification, zip 
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1.6.5. Conditional mean and best approximation 

Let us consider the situation where we arc looking for an approximation 
of Y as a generic function of X - no prior expression of the approximation is 
introduced and we must determine: 

g ( X ) = arg min {\\Y — Z\\ : Z = ip ( X ) ; <p : M — » M} . 

The solution is the orthogonal projection onto: 

S = {s € L 2 (ft) : s = tp (X ) ; tp : R — > M} . 

In this case, we have: 

(Y-g(X),tp (X)) = 0,V<p :R — > M, 


k m 

Y Y Pv ~ 9 ^ P ( X *) = 0, V<p : R — >• M. [1.9] 

i = ij = i 

Let us introduce gi = g (X t ) eK,G = (gi, ..., gk) € = ip (X,;) g 

R , <I> = ((p 1 , <p m ) G W n . Equation [1.9] becomes: 

k m 

Y Y^i { y3-9i)Vi = 0 ,v<ber. 

* = 1 J = 1 

Let us consider $ such that (p t = 0 for i ^ £ and <p f = 1. We have: 

m m mm 

Y p Y ( y i - 9t) = o => g e = Y ^ V Y = Y Ptj Y ifPt» • 

j = 1 3 = 1 3 = 1 3 = 1 

The function g above defined is the conditional mean ofY with respect to 
X (or conditional expectation ofY with respect to X. We use the notations 
E (Y\X) in order to refer to the numerical characteristic Z = g (X) and 
E (Y | X = X ( ) in order to refer to the value gt. The error in the approximation 
is \\Y - E {Y \X)\\. 
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We observe that the expression of gt involves the term yy y /pf » , which 
defines the conditional distribution of Y with respect to X (the expression 
distribution of Y conditional to X may also be found in the literature): 


P (Y = Yj | X = Xi) 


P(X = Xi,Y = Yj) /P {X = Xi) , 
if P{X = Xi) f 0 

0, otherwise. 


We have: 

m 

E(Y | X = Xi ) = Y^P{Y = Yj | X = Xi) Yj. 
j -- i 

In an analogous way, the conditional distribution of X with respect to Y is: 

(P(X = X i ,Y = Y j )/P(Y = Y j ), 

P{X = Xi\Y = Yj) = l if P(Y = Yj) ± 0 

0, otherwise. 

and we have: 

k 

E (X \ Y = Yj) = J2 P ( X = X i | Y = Yj ) X { . 


The two numerical characteristics are said to be independent if and only if: 

V i,j : P {X = Xi | Y = Yj) = P(X = X { ) 
or P(Y = Yj \ X = Xi) = P(Y = Yj), 

i.e., 

V i,j : P (X = Xi, Y = Yj) = P (X = Xf) P (Y = Yj) . 

When X and Y are independent, we have E (Y \ X) = E (Y) and 
E(X | Y) = E(X). 
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For vectors of numerical characteristics, the conditional means are 
analogously defined: we must simply take into account the existence of 
several components: the scalars g, t , X t , Yj become the vectors g;, Xi, Yj, 
respectively. 

1.7. Measure and probability 

The preceding ideas generalize to more general universes f) (for instance, 
infinite ones, eventually uncountable). The generalization is obtained by using 
the concept of measure: let us recall M e = MU { — oo, + oo } the set of the 
“extended real numbers”. The elements of M e arc manipulated according to the 
following rules: 

Vx € M : x T (Too) = (Too) T x = Too 
and x T (—00) = (—00) T x = —00 

Vx € M : x — (Too) = — (Too) T x = —00 
and x — (—00) = — (—00) T x = Too 

Vx > 0 : x. (Too) = (Too) .x = Too 
and x. (—00) = (—00) .x = —00 

Vx < 0 : x. (Too) = (Too) .x = —00 
and x. (—00) = (—00) .x = Too 


0 . (Too) = 0 . (—00) = (Too) .0 = (—00) .0 = 0 
(Too) . (Too) = Too; (—00) . (—00) = Too; 

(—00) . (Too) = (Too) . (—00) = —00 
(Too) T (Too) = Too; (—00) T (—00) = —00 

The expressions (Too) — (Too), (—00) — (—00), (—00) T (Too), (Too) T 
(—00) are undetermined. We also use the following order relations: 

—00 < Too; Vx € 1 : x < Too and x > —00 
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Definition 1.6 (measure).- Let 17 be a non-empty set and V (17) be the set 
of the parts of 17. A measure on 17 is an application p : V (17) — > M e such 
that: 

i) p is positive: p (A) > 0, VA C 17; 

ii) p is countably additive, i.e. additive for any countably disjoint family: 

» (Un en A n) = in e n M ( A n), V {A n } n e N C V (17) such that A* n A d = 
0 for i ^ j, Vi, j G N; 

iii) p (0) = 0. 

In this case, the pair (17, p) is referred to as measure space. 

We say that p is a finite measure on 17 when, in addition, p (17) G M, i.e. 
p (17) < + oo. ■ 

We have: 

Proposition 1.10.- p (|J” = 1 Bi) = i l J (A;) f° r an y finite disjoint 
family ( B, n Bj = 0 for i / j, 1 < i, j < n). ■ 

PROOF.- Let us consider the family { A n } n ef) Cf (17) given by: 

Aj = Bi, i < n\Ai = 0, i > n. 

Then A* n Aj = 0 for i / j, Vi, j G N, so that: 

MULi ft) ^ c(U £H ^) = 

Afc = 0, k > n 

e N B = i A 1 (-®*) • ■ 

= 0, k > n 

Corollary 1.4.- Let p be a measure on 17. If A c B c 17, then p (A) < 
P (B). • 

PROOF.- Let us consider B\ = A, B 2 = B — A. Then fi = Bi U P 2 and 
B\ n B 2 = 0, so that: 

p(ur=i^) ^ p{u k£n A k ) = 

Ak = 0, k > n 

Efc e N M (4fe) ^ E?= 1 M (*) • ■ 

fi(Ak) = 0, k > n 
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Proposition 1.11.- Let /x be a measure on 17. If A c 12 verifies p(A) < 
+ oo, then p (A — B) = p, (A) — p(A n B ), VT> C II ■ 

PROOF.- Let us consider B\ = A — B, B2 = AnB. Then A = B\ U B-2 and 
i?i n i ?2 = 0, so that 

/i (A) = p (Bi U B 2 ) = p (i?i) + p {B 2 ) = /i(d-B) + /i( 4 nS). 

or, A — B C A and A n B C A, so that p(A — B) < p (A) < + 00 and 
p(An B) < p (A) < + 00, that establishes the result. ■ 

Proposition 1.12.- Let p be a measure on 12 and A, B c 12. If one of these 
sets has a finite measure, then ( p (A) < + 00 or /i ( B ) < + 00), then 
/t( 4 U B) = p (A) + p (B) -/i( 4 n B ). ■ 

PROOF.- Let us assume, without loss of generality, that p (y 4 ) < + 00. Let us 
also consider B\ = A — B , B2 = B. Then AUB = i?iUi ?2 and i?ini ?2 = 0, 
so that: 

fi( 4 uB) = /i (£>1 U B 2 ) = /i (Bi) + [i (B 2 ) = /i(A — B) + /x ( B ) 

and the result follows from the preceding proposition. The proof is analogous, 
if we assume that fi (B) < + 00 (in such a situation B\ = B — A, B2 = A). 


Definition 1.7.- Let /x be a measure on f2 and A C 12. We say that A is a 
/x-null set if and only if /x (2l) = 0. In this case, we say that x € A holds 
/x-almost never on 12 or /x-almost nowhere on 12. Analogously, we say that 
x € A holds /x-almost always on 12 or /x-almost everywhere on 12 if and only 
if 12 — A is a /x-null set. ■ 

In the situations where no confusion is possible, the measure /x is not 
mentioned and we say simply “null set”, “almost always”, “almost 
everywhere”, “almost nowhere”, “almost never”. These expressions are 
usually abbreviated by using “a.a.” for “almost always” , “a.e.” for “almost 
everywhere”, “a.n.” for “almost never” or “almost nowhere”. 

We focus on a particular family of measures: 

Definition 1.8 (probability).- Let /x be a measure on 12. We say that /x is a 
probability on 12 if and only if /x (12) = 1. We also say that /x is the probability 
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distribution on Q and that the measure /t ( A) associated with the subset ,4 C fl 
is the probability of A, which is denoted by P (x G A) or simply P (A): 

P(x € A) = P(A) = 

In this case, the pair (fl, P) is referred to as probability space. 

Any finite measure v such that v (fl) > 0 generates a probability by the 
relation: p ( A ) = a ( A ) / v (fl). 

In probability, we use the terminology event to refer to the subsets of fl 
that are possible to associate with a probability (the set of events form a 
cr-algebra, but this concept will be dealt with next): A C fl is called an event. 
Analogously, the properties almost everywhere and almost nowhere arc often 
mentioned - for instance, in theorems or proofs - under the following form: 

Definition 1.9.- Let P be a probability on fl and A c fl an event. We say 
that A is P-negligible or P-almost impossible if and only if P ( A ) = 0. 
Reciprocally, we say that A is P-almost sure if and only if fl — A is 
P-negligible (or, equivalently, P (A) = 1). ■ 

As in the case of general measures, the probability P may be dropped and 
we may simply say “negligible”, “almost impossible”, “almost sure”. These 
expressions may also be used in an abbreviated form: “a.i.” for “almost 
impossible” and “a.s.” for “almost sure”. 


1.8. Construction of measures 
1.8.1. Measurable sets 

In a finite population, a measure (or a probability) may be easily 
constructed by giving an individual value for each member of the population 
and, then, using the additivity property in order to evaluate the measure of an 
arbitrary subset. For instance, the measure may be equal for any member of 
the population. In the case of a probability measure on a population formed 
by N different elements, this leads to the value 1 / N for any member - what 
corresponds to a uniform distribution. 

When considering more general situations, namely those involving 
uncountable populations such as, for instance, the set of real numbers, the 
problem becomes more complex. 
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In practice, the definition given in the preceding section cannot be used for 
the constructions of measures, since it requests the definition (i.e. the 
attribution of the numerical value) of the measure for each subset of Q: for 
instance, if we want to use the definition in order to define a measure 
corresponding to the area on M 2 , it becomes necessary to define a priori the 
area of any arbitrary region of M 2 - i.e. we must evaluate the area of a 
geometrically arbitrary region prior to the definition of what the area of a 
region is. 

For this reason, a more efficient procedure consists of defining the 
measure for a particular family of subsets S B C V (Q) from Q - generally of 
simple geometry, such as, for instance, rectangles - and extending the 
definition to other parts of O by using, on the one hand, the elementary set 
operations (reunion, intersection, difference and complement) applied to 
elements of the basic family *B and, on the other hand, the properties 
established in the preceding section. Formally, we use a cr-algebra: 

Definition 1. 10.- A CP (D) is a cr-algebra on if and only if: 

i) 0 e A; 

ii) A £ A =4> D - A <E A: 

iii) the reunion of any countable family of elements of A is an element of 
A: { A n } n N c A =► U A n g A. ■ 

n G N 

This definition implies that: if A and B arc cr-algebras on Q, then Af'\B is a 
cr-algebra on f 2. This property extends to arbitrary collections of a — algebras 
on (>: the intersection of a collection of cr-algebras on 17 is a cr-algebra on O. 

For any Q, A = V (D) is a o — algebra on V. Thus, for any family 
IB C V (f2) formed by subsets of < ). there exists at least one cr-algebra on Q 
containing IB (trivially, A = V (fi) satisfies this condition). Consequently, we 
may consider the non-empty family formed by the cr-algebras on O 
containing *B. Zorn’s le mm a (see [DE 08]) shows that this family has a 
minimal element a (IB) : the smallest cr-algebra on Q containing 'B, i.e. the 
Borel algebra E (<T IB), where the measure will be formally defined, by 
using an extension based on the concepts of external measure and internal 
measure : 


Definition 1.11.- Let p : IB 


M be such that: 
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i) fi is positive: fi (A) > 0, VA € 03; 

ii) n (0) = 0. 

iii) (j, (0) < + oo. 

Let 4 C 0. The external measure of A is: 
fi e (A) = inf < ^ /I (B n ) : A C |^J B n and Vn € N : B n G 03 

v n G N n G N 

The internal measure of A is: 

H* (A) = ^ (fi) - // (fi - A) 

We can say that A is measurable if and only if fi l (A) = fi e (A) and, in 
this case, we can say that this common value is the measure of A, denoted by 
fi(A). a 

We have: 

Proposition 1.13. -Let {A n }„ gN c Then M n<:N A n G 

S (a and f) n e N A n a 

PROOF.- (J ngN An G £(fi, 53) from the definition of cr-algebra on il. 
Analogously, {fi — A n } ngN C £ (fi, 03), so that 

Un ( fi “ A n) G £ (fi, 03) and, consequently, fi - |J„ e N - A n) G 
£ (fi, 03). Or, 

fi - J (fi - A n ) = fl [fi - (fi - A*)] = p) A n , 

nGN nGN nGN 

which establishes the result. ■ 

Corollary 1.5.- Let {B i } 1 c £ (fi, 03). Then 1J- = i B i € £ (O, 03) 
and n”= i G £ (fi, 03). ■ 

Proof.- Let us consider A* = Bi, for 1 < i < n and A x = 0, for i > n. 
Then {A n } n g N C £ (O, 03), so that (J" = i B i = U n e N A n e s ®)- Let 
A, = B{, for 1 < i < n and A* = fi, for i > n. Then (A n } n gN cS (O, 03), 

sothatnr =1 5 i =n n6N A l GE(n,®). ■ 
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We must observe that this procedure does not lead to the attribution of a 
measure to each paid of fi, but only to those belonging to the “closure” of the 
family 03 for the elementary set operations: some “pathological” subsets may 
not belong to those “closures” and arc referred to as “not measurable”, while 
the members of the “closure arc ” - the “measurable” sets. In other words, we 
ensure that X (fi, 03) c V (fi), but it may happen that V (fi) <£_ X (fi, 03). 

In order to ensure that V (fi) = X (fi, 03), it is necessary to ensure that 
every open paid A C fi may be generated by a sequence of elementary set 
operations applied to elements of 03: for instance, 03 must contain a 
topological basis, i.e. a family of open sets which generates all the open sets 
contained in fi by operations of reunion. 

The practical construction of a topological basis may be equivalent to the 
construction of a measure by using its definition: for instance, let us consider 
M 2 : the construction of a topological basis requests the generation of a 
geometrically arbitrary region by using the elementary set operations of 
elements of 53, this is equivalent to the definition of the area of a 
geometrically arbitrary region of M 2 . 

So, the practical procedure which is most frequently used for the definition 
of measures consists of using a geometrically simple family 53 (for instance, 
the set of the rectangles of M 2 ), even if this set is not a topological basis. As 
previously observed, some “pathological” subsets of fi are not members of 
X (fi, 53). These subsets arc excluded from the theory below, which is limited 
to the parts of fi which arc p—measurables: 

Tl (fjt, fi) = { A C fi : pf (A) = /f {A) } . 

This exclusion is generally used implicitly: we simply write “A C fi” 
instead of “A € 531 (/./, fi)”. But the reader must keep in mind that the 
exclusion of some subsets has an impact on some definitions, such as, for 
instance, null sets, almost sure events and negligible events. Thus, this 
exclusion has an impact on the whole theory presented below. 

Among the remarkable properties of the measures generated by the method 
above, we underline the following: 

Proposition 1.14.- Let fii c fi 2 , 53i c V (fii), 5S 2 C V (fi 2 ), 55i c 53 2 , 
B G 53 2 fii n£> € 53i, p 1 associated to 53i, p 2 associated to 53 2 such that 
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Mi = H 2 ° n »i. Then fi\ {A) = m| (4) and fi\ (A) = n 2 (4) for all A C fli. 

■ 

PROOF.- Indeed, it is immediate that (A) > fi 2 (4), since *Bi C © 2 - 
Moreover, since A C fir C 

Ac J B n ^ Ac J (ftifl B n ) 

n S N n S N 

Thus, 

n {B n ) > Y, M (fit n B n ) > Ml (A) 

n e N n £ N 

and we have also fi 2 {A) > //, {A), what establishes the equality between the 
two measures. In addition, Q 2 — A = (O 2 — I7i) U (f7i — A) and we have: 

M 2 (^2 ~~ A) = M 2 (^2 ~ 47i) + M 2 (4^i — A) 

and 

M 2 (^2 — A) = i_i 2 (^ 2 ) — M 2 («2 14 fli) + M 2 (4^i — -4.) 
i.e., 

M 2 (4^2 ~ A) = n 2 (^ 2 ) — M 2 (47i) + M 2 (4^i — 4) 

and 

M 2 (^ 2 ) - M 2 (4! 2 - 4) = M 2 (4ii) - M 2 (4^i - 4) 

Since /j 2 (I7i) = Mi (47i) and /j 2 (f7i — 4) = / u 1 (f7i — 4), we obtain the 
result. ■ 

The properties of the cr-algebras arise from 9Jt (// . Q): 

1) If 4 e 971 (/x, 17) then 17 — 4 g 9Jt (m, 42): m (41 — 4) = n (17) — / 1 , (4) ; 

2) If {4 n } n g N C 971 (m, 47) satisfies 4,; fl4j = 0 for i / j then 
Un e N e ^ (Mj 4i): M (Un e N = XU e N P 1 
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3) If {A n } n g N C DJI (n, ft) satisfies A t c Aj for i < j then |J n g N A n G 
DJI (fi, ft): n (U n e N A n) = I™ sup M (Ai)- 

4) If {A n } n g N C DJI (fi, ft) satisfies A c Aj for i > j then f) n g ^ A n G 
DJI (n, ft): fi (f| n e N A n ) = lim inf n (A n ). 

We have: 

Proposition 1.15.- If V n g N : A n g 9Jl(/j,,fl) then 

P (Un e N Ai) — 'Yhn e N P (An)- ■ 

Proof.- Since A n G DJI (//. fi), V n € N, we have 

^4 = Un e N A n £ 9Jl(/n,fi). Let Bo = Aq and, for n > 1, 
B n = A n — Urr/ Bi . We have B n G DJI (/r, fi), V n € N. Moreover, 
Bi n Bj for i / j and A = U„ e N B n . Thus, 

H(A) = fl( U Bn) = p( R n )• 

\n G N / n E N 


or, 



this establishes the result. ■ 

Corollary 1.6.- If V n G N : /i (Ai) = 0 then /_/ (|J n eN Ai) = 0. ■ 
Proof.- Since // (Ai) = 0, we have: 



1.8.2. Lebesgue measure on M p 
Let a G M p and b G ML The set: 

p 

n (a, b) = {x G M? : a n < x n < b n , 1 < n < p} = JJ ( a n , b n ) 

n = 1 
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is an open “rectangle” from W’. 

Let us consider M £ M and 

n M = (— M, M) p = {* € R p : - M < x n < M, 1 < n < p} . 

The set of all the open “rectangles” from Qm is: 

m (0 M ) = (a, b ) : TZ (a, b) C } • 

The measure of an element 9v (£Im) ma y b e defined as (Borel measure): 

v 

t M (K {a, b)) = (b n - a n ) . 

n = 1 

£m may be extended to S (CIm> ^ (^m)) by using the concepts of internal 
and external measures, as in the preceding section. One of the remarkable 
properties of £m is the following: 

Proposition 1.16 - Let N > M >0 and Ac£l M - Then l e M (A) = i e N {A) 
and t M (A) = t N (A) . u 

PROOF.- The result follows straightly from the last proposition of the 
preceding section. ■ 

The measure defined by this procedure is referred to as the Lebesgue 
measure and is denoted by £, since it is invariant with respect to M. The set 
formed by the subsets of H C K p which arc measurable in the sense of 
Lebesgue or Lebesgue -measurable is: 


+00 

= [J 9 Jl(£ n ,fl). 

n = 1 


Among the interesting properties of the Lebesgue measure, we may stress 
the possibility of decomposition into a product of measures - the Borel 
measure may be written as a product ofintetyal measures, such as: 

p 

£ m (K (m, M)) = £ m (K, M n )) . 

n = 1 
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So, the measure on may be considered as the product of p measures on 
M. This property is used in the Fubini ’s theorem. 


1 . 8 . 3 . Equivalent functions 

It is now convenient to introduce one of the fundamental building blocks of 
the theory under construction: the concept of ^-equivalent functions. 

Definition 1.12 (^-equivalent functions).- Let f,g : D — » be two 

functions. We say that / and g arc ^-equivalent if and only if 

A = {uj €E D : / (ui) / g (w)} is a /i-null set. ■ 

This definition introduces an equivalence relation between the functions 

/ ~/t g 4=^- / and g are ^-equivalent. 

Indeed, 

/ /; / g =*• g f ; / 9 and g h =>• / h. 

The class of equivalence of / is: 

[fl = {g:n^W:g^ /} . 

The members of [/] are usually identified to /, since they may differ only 
on a null set. For instance - as we can see in the following - all the members 
of [/] have the same integral, and this common value becomes the integral of 
the class [/] and not only the integral of /. 

In the following, we will manipulate some particular sets of functions: 

Definition 1.13 (measurable function).- Let / : Q — > be a function. 
We can say that /is g -measurable on <> if and only if / _1 (DJI (t, M p j) c 
DJI (//, Q), i.e. if / transforms measurable sets into measurable sets. The set 
formed by the functions ^-measurable on (> taking their values W is denoted 
by: 


M (fi, a RP) = {/ : n — > WP : 

AgDJI (e, M p ) =► f - 1 (A) £Wl{g,n) } . ■ 
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Definition 1.14 (simple function).- Let / : fl — > M p be a function. We can 
say that/is a simple function or elementary function on fl if and only if: 

/ € M (p, fl, M p ) and / (fl) is finite, 

i.e. / is a measurable function taking a finite number of values. The set formed 
by the simple functions on fl is: 

£ (p, fl, M p ) = {/ € M (p, fl, M p ) : / (fl) is finite } . ■ 

For any e € £ (p, fl, R p ), we define: 

e* (fl) = e (fl) — {0} = {a € M p : a f 0 and a € e (fl) } . 

Since e is ^-measurable, we have: 

Vt/Ge* (fl) : e^ 1 ({y}) is p — measurable. 

Eliminating from e* (fl). the terms coiTesponding to null sets, we have: 

e *ii (0) = {a G e* (fl) : p (e~ l ({a})) > 0} . 

Definition 1.15 (characteristic function of a subset).- Let A <z fl. We denote 
1 a the characteristic function of A, given by: 

1 a (w) = 1, if id £ A\ 1 a (oj) = 0, if uj f. A. m 

The characteristic function of a subset must be distinguished from the 
characteristic function of a random variable: the last one is the Fourier 
transform of the probability distribution of the random variable (see section 
1.10). Characteristic functions of subsets may be used in order to represent 
the subsets themselves and have a strong connection to the notion of 
conditional probability (see section 1.11.1.1.). They have some useful 
properties, such as: 

1a-1b = 1,4ns, max {1^, 1 b ] = In + Is - ln-ls = Iaub, 

1a (fl) = {0, 1} , (1a)" 1 ({0}) = fl-A, (1 a)- 1 ({1}) = A. 
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Proposition 1.17.- We have: 

i) A G 9Jt (p, 17) if and only if 1^ G M. (p, 17); 

ii) If e G £ (p, 17, M p ), e* (17) has exactly n elements and e* (17) = 

n 

{«i, ..., a n }, Ai = e~ l ({a*}), then e = X ad-Af, 

1=1 

iii) If e G £ (p, 17, M p ), e* (17) has exactly m elements and e* (17) = 

m 

{Pi ,-,Pm W B i = e_1 ({Pi})’ then e X] " 


Proof.- 

i) : is an immediate consequence of the properties of the characteristic 
function: (1 a) _1 ({0}) = 17 — A and (l^) -1 ({1}) = A. 

ii) : we have 


co G 17 ==f> either 1 a (c o) = 0 or uj G 

i = 1 

By using that ,4, C\ Aj = 0 for i f j, we obtain the result, 

iii) : is immediate. ■ 

When 17 C R n , we may consider rectangular simple functions: 

Definition 1.16 (simple rectangular function).- Let 17 c M n and 
/ € M. (p, 17, M p ). We say that/ is a simple rectangular function on 17 if and 
only if / G £ (p, 17, M p ) and 

A = f~ 1 ({a}) G PI (17) ,VaEe* (17) , 

i.e. the inverse image / _1 ({a}) of each non-null a from the image of / is a 
rectangle. The set formed by the rectangular simple functions on 17 is: 

S R (p, 17, RP) = {/ G 8 {p, 17, RP) : 

A = f- 1 ({a}) € JR (17) , V a G e* (17) } . ■ 

As we will see in the following, £ r (p, 17, W’) has an essential rule in the 
construction of integrals, due to its properties of density, namely in the 
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situation where fl = 1Z (a, b) C M n . When fl = (a, b) CK is an interval of 
real numbers, / is a rectangular simple function if and only if there arc n > 0, 

(fl, f n ) € [M p ] n and n subintervals A,- = (a*, 6*) C (a, 6), 1 < i < n such 
that: 


/=£ /.In, 

i= 1 

In this case, we may consider only simple functions generated by partitions: 

Definition 1.17 (partition of an interval).- Let fl = (a, b) c M. A 
n-partition of fl is an element t = (to, .... t n ) € M n+1 such that: 

to = a, t n = b, U- 1 < U for 1 < i < n. 

The diameter of the partition is: 

6 ( t ) = max {ti — ti-i : 1 < i < n} . 

The set formed by the n-partitions of fl is: 

‘Partjj (fl) = {t = (to, € M n+1 : t is a n-partition of fl} 


and the set of the partitions of fl is: 

+oo 

<Part(fl)= J ^art n (fl).- 

n = 1 

Definition 1.18 (simple function defined by a partition).- Let 
fl = (a, b) C M and / G JA (p, fl, R p ). We say that /is a simple function 
defined by a partition if and only if there exists an n-partition t £ '/'art (11) 
and (fi, ..., f n ) € [M p ] n such that: 


t = (t 0 , ■■■, t n ) € M n+1 , / = y fil Ai and = (U-i, f) . 


The set formed by the simple functions defined by partitions is denoted by 

£ P (p, f X R p ). m 
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1.8.4. Integrals 

The definition of measures aims generally the manipulation of integrals: the 
goal is to give a definition to expressions having the form f <} /, where ft C W p 
is measurable and / : ft — y K is a function. Naively, we may consider such 
an integral as the limit of a sequence of finite sums, which may be defined as: 

« n n 

f ~ V / (xf) i ( Ai ) ; I) A % = ft; X{ G A t for 1 < i < n 

i = 1 i = l 

and the limit of these finite sums are taken for, on the one hand, n — > +oo 
and, on the other hand, the maximal diameter of the subsets A t going to zero. 

More formally, we start by the definition of integrals over the set of the 
simple functions. For instance, 

Definition 1.19 (integral of a simple function).- For 
e G £ (p, ft, M) such that e* (fi) has exactly n elements and 
e* (D) = {an, ..., a n }, we define: 

r. n 

/ e p(dx) = ^2 (Ai) '■> Ai = e^ 1 ({cii}) for 1 < i < n. ■ 


The definition on £ { //, Q, M) is extended to the set of positive measurable 
functions: 


M.+ (p, ft, M) = {m G M. (p, ft, K) : m > 0 } 

as follows: let / G (p, ft, M) and 

£ + (p, f !,/) = { and G £ (p, ft, M) : 0 < and < /} . 

£ + (p, ft, /) is the set of the positive simple functions which are upper 
bounded by /. We set: 


/ p(dx ) = sup < / ep(dx) : e G £+ (p, ft, /) 
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The next step consists of considering a function / £ M. (g, £1, M) having 
an arbitrary sign. In this case, we use the decomposition 

/ = f + ~ f~,f + (x) = max { / (x) , 0} , f~ ( x ) = max {-/ (x) , 0} 

and we define 

[ f H{dx) = [ f + g(dx) - j f~ n{dx). 

J J J 

We notice that the right member is well-defined, given that / + and f~ 
are elements of M. + (g, fi, M). Sometimes, we will use one of the notations 
J n f g{x £ dx), f Q f dg or f Q f dg ( x ) instead of f n f g(dx). 

This definition leads to the usual properties of integrals. For instance, for 

a,/3eM; f,g£M(g, R): 


(af + fig) n(dx) = a / fg(dx)+/3 / gg(dx)- 


r. n 

/ / v(dx) = 

Jo i = 1 - 


f t*(dx) 


n 

if |^J A. t = Q and A t n Aj 

i = 1 


0 for i / j ; 


f<g /U-a.e. on Q =$■ / f g(dx) < / g fj(dx); 

Jo Jo, 


f f*(dx) 


f H{dx) 


f I / I 

Jo 

[ I / I A(dx)\ 

Jo 


g(A) = j 1 A/j,(dx) = / 1 g(dx) = / g(dx), V A £ Wl (g, ft). 
Jo J A Ja 
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We have also Jensen’s inequality', if /i is a probability, / € M {n, lb M) 
and g : M — > M is a continuous convex function such that g (/) is measurable, 
we have: 

<J ( f fti(dx)\ < [ g(f)n(dx). [1.10] 

\Jn J Jn 

An immediate consequence of Jensen’s inequality is the following: 

Proposition 1.18.- | E (U) \ p < E , for all 1 < p < oo such that \U\ p 

is measurable. ■ 

PROOF.- It results from the application of equation [1.10] to g (£) = |£| . ■ 
We have: 

Proposition 1.19.- Let / € M(nM) such that V A e fi): 

f A f n{dx) = 0. Then / = 0 q-a.e. on Q. u 

Proof.- Let A = {uj € : |/| > 0}: let us show that /j (A) = 0. 

Let n, k > 0. Let us consider: 

Ak = | at € Q : \f (x)\ > 

and 

A n ,) fc = ja; G Cl : n > |/ (x)| > ^ 

We have A = |J fc e at A k and A fc = (J n e N A n,k- 

In addition, (|,n) C M is measurable, so that A n ,k = / _1 ((|,n)) is 
measurable. We have: 

0 = f f g{dx) > 7 / /r(dx) = jn ( A n , k ) . 

JA n , k k J An k k 

Thus, n (. A n , k ) < 0 n (. A n , k ) = 0 . So, A k = |J n e N A n , k is a null set, 

what implies that A = (J fc g n A k is also a null set. ■ 
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Corollary 1.7.- Let / € M. (g, 17) be such that: 

V g such that fg € M (//, 17) : / fgg(dx) = 0. 

Jn 

Then / = 0 /i-a.e. on 17. ■ 

Proof.- Let A € Oil (g, 17). We have: 

[ fn{dx) = [ flA/x(dx) = 0. 

J a Jn 

Since A is arbitrary, the result follows from the preceding proposition. ■ 

Corollary 1.8.- Let / € M. (g, 17) be such that / > 0. If f n f g(dx) = 0, 
then / = 0 /i-a.e. on 17. ■ 

Proof.- Let A € 011 (//, 17). We have: 

0 < / f n{dx) = [ fg(dx) - [ fg(dx) < f fg{dx) = 0. 

J a Jn Jn-A Jn 

Since A is arbitrary, the result follows from the preceding proposition. ■ 

Finally, we define: 

[ f= [ fe(dx). 

J J 

Thus, whenever the measure is not specified, the integral is with respect to 
the Lebesgue measure. 

In the following, we use the usual notation L 1 (17) to refer to the set of the 
classes of equivalence [f] e of the Lebesgue- measurable functions: 

= 17, R)}. 

Naively, L 1 (17) may be considered as the set of the functions which are 
Lebesgue-measurable: we may identify [f\ t to /. This identification is justified 
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by the fact that the integral takes the same value for all the members of the 
class: 


g n(dx) 



f F (dx). 


so that we may consider the common value as the value of the integral of the 
class: 


[ [f] ix v{.dx)= [ f n(dx). 

Jn Jn 

In addition, the last result shows that: 


g / «=>- [ I / - g\ i4d'x) = o. 

Jn 

In an analogous way, we will denote: 

I?(Sl) = {f:\f\*eL 1 (fi)}. 

As previously observed, the measure on M 7 ' may be considered as a product 
of p measures on M. In fact, this is not an ordinary product, but the equality: 

( fd£ = f [ ... f fd£(xi)d£(x 2 ) ...d£(xp) 

J ]R 

p times 

This decomposition leads to Fubini’s theorem: 


fd£ = 


f{x,y)d£ (x) 


d£ (y) 


f (x, y) di (y) 


di ( x ) . 


Finally, the definitions above extend to the situation where / : f! — y M 7 ' 
is a function taking its values on through an evaluation component by 
component: 


/ = (/i, .... f P ) 


/ F(dx) 


fi F>(dx) 
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1 . 8 . 5 . Measures defined by densities 

When ft c VJ\ we may generate measures by using densities : naively, 
a measure /j is defined by a density m when /j, (x G dx) = m ( x ) dx. Such 
a measure may be considered as a transformation of the Lebesgue measure 
which generates the same measurable sets. Into a more formal manner, 

Definition 1.20.- Let QcP and /x be such that 911 (/x, ft) = 9J1 (£, ft) (in 
this case, A4 (/ x , ft) = M. {£, ft)). We say that /x is defined by the density m if 
and only if m is a Lebesgue-measurable function and 

[ ffi(dx) = [ fm, V/ G M (/x, ft) = M {£, ft) . ■ 

Jn Jn 

The existence of densities is studied by the Radon-Nikodym theorem-. £ 
must be dominant with respect to /x (i.e. £ (A) = 0 => // (A) = 0). This 
theorem will not be studied in this text. 

We have 

Lemma 1.4.- If /x is defined by the density m, then: 

Li (A) = [ m, \/A € (£, ft) . ■ 

J A 

PROOF.- Let us consider the characteristic function of A: f = I 4 . We have 
1 a (ft) = {0, 1}. We have 1a € M (£, ft) (since (1 a) - 1 ({0}) = ft — A and 
(1a) -1 ({1}) = A). Thus: 



1 Ad(dx) 



Proposition 1.20.- If fi is defined by the density m then m > 0 /U-a.e. and 
Lebesgue-a.e. on ft. ■ 


Proof.- Let A = {u £ ft : m (lo) < 0}: let us show that /x (A) = £ [A) = 0. 


Let n, k > 0. Let us consider: 


Ak 


G ft : rn (tc) < 
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and 


Ai,fc = ^ : — n < m (lu) < — . 

We have A = (J fc e N A k and A k = |J n e N A n ,k- 

(— n, — ^) C K is measurable, so that A nk = ?n _1 ((— n, — ^)) is 
measurable. We have: 



Thus, on the one hand, i (A nk ) < 0 ==> t (A nk ) = 0 and, on the other 
hand, /x (A Ut k) = 0. So, A k = (J n g N A n ^ k is a null set, which implies that 
A = Ufc g N A is a l so a nu ll set. ■ 

1.9. Measures, probability and integrals in infinite dimensional spaces 

Let m G Kg and M £ Mg. An open rectangle of R fc is denoted as 
7 Zk (m, M): 

k 

1Z k (m, M) = |tc £ M fc : m n < x n < M n , 1 < n < fc| = (m n , M n ). 

n= 1 

The set formed by the open rectangles of M />: is: 

iHfc = \lZ k (m, M) : m £ Kg and M £ Kgj . 

A closed rectangle is denoted as 1Z k (m, M) and the set of all the closed 
rectangles of K /r is 91 k . It may be useful to recall that K^ may be covered by 
a finite or countable number of subsets of 9\ k or 9X k . For instance, for e > 0, 
lZ k (— oo, e) U 7 Zk (— £, + cxd) = lZ k (— oo, 0) U 1Z k ( 0, + oo) = 
K /,: . More useful coverings may be generated by the translations of a fixed 
rectangle, in order to get a covering formed by identical rectangles. This 
property is often used in numerical applications. 
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We use the notation M°° to refer to the space of the sequences of real 
numbers: 

M°° = {x = (xi, X2, X3, ...) : x n € M, V n £ N*} . 

It may be interesting to consider M°° as a countable Cartesian product: 

+OO 

M°° = MxMx ... = M. We use an analogous notation for M£°. An open 

n = 1 

rectangle of M /,: is denoted as (m, M), where m £ M£° and M £ M^°: 

+OO 

Uoo (m, M) = {x £ M°° : m n < x n < M n , V ro € N*} = ]/[ (m n , M n ). 

n = 1 

The set formed by the open rectangles of M°° is: 

iHoo = {^00 (m, M) : m £ and M £ M~} 

As in the finite dimensional situation, M°° may be covered by a finite or 
countable reunion of elements from DTx, . For instance, 'R.^ (— 00, e)U 
R-oo (— e, + 00) = ftoo (— 00 , 0) U Roo ( 0, + 00) = M°°. This 
property is also used in numerical applications. 

The notation Mq° refers to the subset of M°° by the sequences having 
only a finite number of non-null elements, i.e. the sequences for which the set 
e (x) = { n £ N*, x n 7^ 0} has a finite number of elements: 

M§° = {x £ M°° : card (e (x)) = k £ N} . 

Here, card ( • ) is the cardinality (number of elements). Mq° may also be 
identified as n = /J x : let x € Mg° and e (re) = { m , ..., n^}. 

ken* 

The application: 

x = (xi, x 2 , X 3 , ...) -^4 (ni, ..., rik, x ni , ..., x nk ) £ n 

is a bijection between Mg° and II. n does not possess open rectangles, given 
that (IT*/' is a discrete space, but an analogous rule is performed by the set: 

V= U (V) fc x94) 

k e N* 
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These properties arc used in the following. 

Let us consider a real separable Hilbert space ( V, (•,•)), i.e. V’ is a 
lineal - space; ( • , • ) is a scalar product on V; V is complete for the norm 
|| • || associated with the scalar product ( || v || = \J (v,v) ); there exists a 
countable subset S C V which is dense on ( V, || • || ). Since one of our aims 
is the application of the theory to optimization problems - thus, to the Calculus 
of Variations - our model for V is a Sobolev functional space: the elements of 

V are assumed to take their values on W l for some d £ N*. The theory extends 
to Banach spaces, such as [L v (fl)] rf , but this is not our objective. 

In order to simplify the notations, (•,•) and ||»|| are used in the following 
for scalar products and norms on other spaces. No confusion is induced by this 
simplification. If necessary, the space will be specified by writing ||«||y. 

Since V is separable, this space possesses a Schauder basis, i.e. a 
countable orthonormal family ( I> = {<£> n }„ gN « C V such that any v £ V may 

+oo 

be represented by a unique convergent series as v = v n p n . If convenient, 

n = 1 

we may assume that this basis is orthonormal, i.e. a Hilbertian basis: every 
Schauder basis may be transformed into a Hilbertian basis by the 
Gram-Schmidt procedure of orthonormalization. 

Thus, each v £ V is entirely characterized by the sequence of real 
numbers v = (v\ ,v 2 ,v 3 , ...) £ M°° which is uniquely determined and V may 
be identified to a subset V of M°°, given by 

V = {a = (ay (12,(13, ...) : ||a|| < oo} ; ||a|| = (J 2 n=i a n) 1 ^- Let: 

/ +0O 

V = {a = (ai, a2, (I3, ...) : || a || < oo} ; || a || = ^ c? n 

\ n= 1 

Let us consider the application I : V — > V given by: 

+oo 

V = Y2 v n p n — > V = (vi, v 2 , V 3 , ...) 
n = 1 
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and 


+OO 

( X, y ) = ^2 X nVn ■ 
n = 1 


Since 

(I (v) , I (w) ) = ( v, w ) , 

I is an isometric bijection between V and V; ( V, ( • , • ) ), which is a 
Hilbert space. 

This identification leads to the construction of topological basis and 
measures on V by using those defined on V. For instance, we may use 
53 oo = ^ n V, formed by the open rectangles of V: 

55oo = {^oo (m, M) : m G V and MgV}C 93oo- 

The finite dimensional version 53/,. of 53 oo associated to M / ' is simply 53 /,. = 

m k . 

For practical purposes, such as numerical approximations, we also consider 
a countable family d' = { 0 n } nG f : j C V such that its linear span, 

f k 

V = [\F] = < ^2 a nifni '■ k F N*, a Ui G M, for 1 < i < k 


is dense in V, i.e., 

VrGF:Ve>0 : 3 r £ £ F such that || v — v £ || < e. 

V may be identified to a subset Vo of V formed by the sequences a 
containing only a finite number of non-null elements, i.e., 

V 0 = {a G V : a G Mg° } . 

Thus, / o 7r is a bijection between V and 'Tfi = 7r ( V’o ). This property may 
be used in order to construct measures on V. Moreover, the introduction of the 
family & leads to an extension of the results to separable Banach spaces. 
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Let ( W, ( • , • )) be another Hilbert space. (W is also a functional space 
and the elements of W arc assumed to take their values on M' :/ for some d G N*, 
analogous to V). 

We denote by £ ( V. , W) the set of all the linear continuous maps £ : V — > 
W: 

C (V, E) = { l : V — > W : £ is linear continuous} . 

For £e£{V, W), 

|| £ || = sup { || £ (v) || : || v || < 1} < oo. 

Thus, there exists a real number M G M such that: 

Vn G V : || £ (v) || < M || v || . [1.11] 

When W = M d , there exists a real number M p G M such that: 

Vn G V : | £ (v) \ p < M p || v || . [1.12] 

For a subset A C V, we denote V {A), the power set of A, i.e., the set of 
all the subsets of A. 


1 . 9 . 1 . Finite measures on infinite dimensional Hilbert spaces 

The most common way for the manipulation of finite measures on general 
Banach or Hilbert spaces is the use of cylindrical measures [SCH 69]. Naively, 
a cylindrical set of V is a subset of a finite dimensional subspace of V and a 
cylindrical measure is a measure defined on the cylindrical sets of V, i.e. an 
application /t : € — > R e , where £ C 'P (V) is the set of all the cylindrical sets 
of V. More formally, we consider k G N and we define £ as the reunion of all 
the inverse images of measurable parts of M /,: for the elements of £ ( V. M fc ) 
and k G N: 

£ = {C : C = r 1 (A) , £ G £ (V, R k ) , k € N, 

A measurable paid of M fc } C V (’ V ) . 
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Then, a cylindrical measure is an application p : a ( £ ) — > M f such that 
Hi = n o £~ l is a measure on R k , V £ £ C (V, k), k £ N( so, p ( (A) = 
/i {(r ] (A)) , A measurable subset of M fc ). The restriction of a Radon measure 
to £ defines a cylindrical measure: in the classical analysis, this procedure is 
the most usual manner for the manipulation of measures in infinite dimensional 
spaces. 

Here, we will adopt a different approach : as previously observed, measures 
on M are defined by using Borel algebras and families of geometrically simple 
subsets: 23i: £ ( M ) = o ( Q3i ). In practice, we may define a measure v on 
M by using a density /, i.e. a function /: M — > M such that / > 0 on K, 

f < oo and dv = f (x) dx. In this case, we may consider: 

Jr 

b 

v ( (a, b) ) = J f. 


When considering probabilities, / is the probability density and 
In this case, 


= 1. 


P{dx) = P (x € dx) = dv = f (x) dx. 

This procedure extends directly to by using S ( ) = a ( Tv- ). 

A simple way in order to generate measures on M /l ' consists of the use of a 
product of measures: let ( v\, v^) be measures on M. Then, 


k 

v ( TZ k (m, M) ) = v n ( ( m n , M n ) ) 

n = 1 


defines a measure on M fc . Analogous to M, v may be defined by a density /, 
i.e. by a function /: — > M such that / > 0 on M fc , / / < oo and 

J R fc 

dv = f ( x ) dx i ... dx k - Thus, 


v ( U k (m, M) ) = 


/■ 


7 M) 
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Analogous to the one-dimensional situation, when considering 
probabilities, / is the probability density and f f = 1. Thus, 


P (dx) = P (x G dx) = dv = f (x) dx\...dxk- 

Densities may also be generated by using products: if, for 1 < n < k, 

k 

dv n = fn it) dt , then / = /„ and 

n = 1 


V ( n k (m, M) ) 



'7Zk(m, M) 


f. 


This procedure extends also to M°°: “naively”, a sequence { v n } ngN of 
measures on M may be used in order to generate a measure u on M°° by using 
v (A\ x A2 x ...) = (Ai) b>2 (A2) .... More formally, we use IHooi 


+ OO 

v (IZoo (m, M)) = v n ( ( m n , M n ) ) . 

n = 1 

defines a measure on V. As previously observed, the use of densities may be 
more convenient for practical purposes: if du n = f n (t) dt,\/ n € N, then: 

M n 

T'n ( ( kfl n , M n ) ) — f f n ( X n ) dx n 


and 


fn i%n) dx n J . 

“Naively”, this procedure corresponds to the construction of a density /: 
M°° — > M such that dv = f (x) dx\dx2 ■■■, i.e. to a density 

+ OO 

/ (*) = II f n C Xn )• 

n = 1 


ZV {Koo (m, M)) = ] j 
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An analogous approach may be used in order to construct finite measures on 
II: for each k G N*, let us consider a finite measure u k on and k summable 
sequences of strictly positive real numbers having a sum inferior to 1: 

<ik = i,-, <Zfe,fc) € (M°°) fc , q k,i = (qk,i, l, Qk,i, 2, Qk,i, 3 ,-)> 

[i 13 ] 

Qk,i,n — 0 ? for /u, 7, 71 G N , ^ ^ Qk,i,n — Qk,i — 1* 
n = 1 

Let us assume that is a finite measure: 

u k ( ) = Afc < 1 

Let (m, n k ) x lZ k (m, Af) G (N*) fc x be an element of *p and a 
summable sequence of strictly positive real numbers: 


+oo 

p = (pi, P2, P3> •••) € K°° ; Vn > o, Vn G N* ; ^ p n = P > 0 [1.14] 

n = 0 

We define: 

k 

rj((ni,...,n k ) xK k (m, M)) = p k u k (TZ k (m, M) ) 

* = t 

Then 77 is a finite measure on ip. As usual, we may use densities instead of 
measures: 


k 

V ((ni, n fc ) x (m, M)) = p k q kti>n . 

i ■- 1 




where: 

fk,i,n > 0 on M and / f kjltH < 1, for k,i,n G N*. [1.15] 

This procedure generates a finite measure on Mg° :u = tjott (i.e. u ( A) = 
77 ( 7 r ( A) ))isa finite measure on Mg°. 
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When V has a basis, the identification between V and V C M°° may be 
combined with this procedure: if v is a finite measure on M°°, then p = v o I 
(i.e. /j ( A ) = v {I (A) )) is a finite measure on V. Analogously, we may 
consider /i = 770770 I (i.e. p ( A) = 77 ( 7T ( I (A) ) )): p is a finite measure 
on [V], 

An alternative measure may be defined as follows: let us consider the 
situation where the space is M fc . Let p = ( pi, ..., p^ ) be a vector of positive 
real numbers (p n > 0, for 1 < n < k) and f = ( /1, ..., //,. ) he a bounded set 
of positive functions which arc integrable on M (i.e. /„ > 0 on M for 

1 < n < k and there exists A € M such that / f n < A, for 1 < n < k ). We 

Jr 

may use p and f in order to define a finite measure on : 

4. / Mn \ 

f n (x n ) dx n . 


v (JR-k (m, M)) = ^ 


Pn 


Indeed, v is a finite measure on l fc : u > 0, v ( 0 ) = 0, v is countably 

k 

additive, u ( ) < A E Pn < 00. 

n = 1 

This alternative approach may be extended to M°° by using a sequence 
p verifying [1.14] and a bounded sequence / = (/1, f-2, h, ■■■) of positive 
integrable functions on M (i.e. V ?i <G N : f n > 0 on K and there exists A € M 

such that / f n < A, V n € N ). In this case, we set: 

Jr 


+ OO 

v (TZoo (m, M)) = ^2 p n 

n = 1 



This series converges when ^ (m, M) € : for instance, we have: 


, _ M n 

+ 00 p 

+ 00 

M n 

r 

^ ^ Pn 1 fn (%n) d%n 

< 

1 fn (%n) d%n 

n = k mn 

n = k 

J 

m n 


+ OO 


A ^ p n — » 0, for k 

n = k 


+OC. 
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Analogous to the finite dimensional situation, where the space is M fc , v 
is a finite measure on M°°: u > 0, v ( 0 ) = 0, v is countably additive, 

v ( M°° ) < pA < oo. 

This procedure may be applied to Mg° into an analogous way: let us assume 
thatp satisfies [1.14], q k verifies [1.13], fk,i,n satisfies [1.15] and 

+ OO k 

^ ^ Qk,i,n Qk,i » ^ ^ Qk,i Qk — A ^ 

n = 1 i = l 

where A is independent of k. For (m, x 7?./,. (m, AT) G (N*) fc x d\ k 

C we define: 

fk,i,rii (%ni) dx ni ^ . 

Then 7] is a finite measure on rj > 0, q ( 0 ) = 0, r/ is countably additive, 
< pA < oo. Thus, ^ o 7r is a finite measure on Mg°. 

Analogous to the preceding situation, p = zv o / is a finite measure on V - 
when V has a basis - and p = qo -no I is a finite measure on [ T>\. 

1.9.2. Integration involving a measure on an infinite dimensional Hilbert 
space 

As previously remarked, our objective is the application of the theory to 
problems of calculus of variations: so, V and IF are real functional spaces 
such as, for instance, [L p (ft)]" or the standard Sobolev spaces, and their 
elements take their values on VJ‘ for some k > 0. In this case, the extension of 
the preceding ideas, namely of the integrals defined for finite dimensional 
situations are quite natural. For instance, we are interested in the evaluation of 
the mean value E (U) on S C F for a random variable U defined on V, i.e. 
the mean value on S of a function U : V — > W, associating with each 
r£kan element U ( v ) G W . We have: 

E (U) = j Udp j p (5) 


V ((ni , ..., n k ) x n k (m, AT)) = p k ^ q k ^ ni 
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The evaluation of f s U (v) d/i involves a non-standard integration, since /j 
is defined on an infinite dimensional space. Nevertheless, the formal definition 
is entirely analogous to the definition introduced for the finite dimensional 
situation: for instance, we initially restrict ourselves to the situation where 
the elements of W are either real numbers or real functions (i.e. IT = R or 
W = LP (fi) ). 

Analogous to the finite dimensional case, let A4 be the set of the 
measurable functions, which arc the functions which transform measurable 
sets into measurable sets. We consider the set of measurable finite range 
functions'. T = { / £ M. : / ( V ) is a finite set }. For F € T, we define 
R* (/, S) = {afK:a/0 and a € / (5)}. The set of elementary 
functions is: 

£ (H,S) = {e <E F : n (e -1 ({y}) ) < oo,Vy G R* (e,S)} 

If n is a finite measure, we have £ = T . For e G £ (fi,S) such that 

R* (e, S) = { q | , ..., a n } , the integral is defined as: 

n 

/ edji = ^2 aiH ( Ai ) ; A,; = e -1 ({«*}) for 1 < i < n. 

J S i = 1 


This definition is extended to the set of positive measurable functions 
Ad+ = {m € A4 : m > 0} as follows: let U € A4+. We consider the 
positive elementary functions upper bounded by U: 
£ + (n,S, U) = {e € £ (n ,S ) : 0 < e < U} and we define: 



sup 


e : e € £+ U) 


Finally, an arbitrary U € Ad is decomposed as U = t7 + — t7 , t7 + (x) = 
max {t7 (x) , 0}, U~ (x) = max {— U (x ) , 0} and we define: 

[ Udfi= f U + dfj. — [ U~dfj,. 

Js Js Js 

The right member is well-defined, since both U + and U~ are elements 
of Ad+. The definition yields the classical inequality: | Udfi\ < f s \ u \dn„ 
which corresponds to a particular form of Jensen’s inequality: for a convex 
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function g : M — > M such that g ( U ) is measurable, we have g ( E ( U )) < 
E(g(U)). 

The extension to the situation where the elements of W are either real 
vectors or real vector-valued functions (i.e. W = or W = [IJ‘ (fl)] d ) is 
performed by considering the components of U: we have U = (U\ . ..., U d ) 
and we define: 

[ U dp = ( [ Uidfi, [ U d dn 

Js \Js Js 

Let g : — > K be a continuous convex function such that g ( U ) is 

measurable. Since g is the pointwise supremum of all its affine minorants 
[TIE 84], we have Jensen’s inequality: 


9 (.Js U ~ Js 9 ^ dfJ " 

and also: 

\E(U)\ p < E(\U\ p ), for aU 1 < p < oo such that \U\ p is measurable. 

As usual, the integral of U : V — > W on S C V may be approximated 
by: 


* n n 

J U dp ~ ^2 U ( Xi ) n ( Ai ) ; Aj = S\Xi € A % for 1 < i < n [1.16] 


and corresponds to a limit for n — > +oo, with the maximal diameter of the 
subsets Ai going to zero. In our numerical calculations, we do not use the 
approximation [1.16], but a Monte Carlo approximation : we generate a sample 
U ] , ..., U nr of nr variates from the distribution p and we approximate: 


E(U) 


1 

nr 




[1.17] 


This approach is more convenient for the situation under consideration: 
when using standard distributions, such as, for instance, Gaussian or 
Poissonian distributions, efficient methods of generation exist in the literature. 
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1.10. Random variables 


Let us consider 17 non-empty and a probability P on (1 (i.e. a probability 
space (fi, P)) (since the rr-algebra is always the Borel algebra we do not 
mention it explicitly, but of course the probability is defined for the Borel 
sets). 

A random variable X on 11 is a measurable function X : O — y M. It may 
be seen as a generalization of the concept of numerical characteristic. A 
particular value of X is called a realization of X or a variate from the 
distribution of X (or simply a variate from X). A variate may be considered 
as the value X(ui) for a particular - uj. 

The image of X is I = X (fl) C M. For any part J C R, we set: 

?(iej) = p(r 1 (j)). 

Since X is measurable, the probability above is defined for every 
measurable part of R - namely, for the intervals of R. Thus, we may 
characterize the global behavior of X by using its cumulative function, i.e., 

F (x) = P (X < x) = P {X € ( — oo, x )) . 


In this case, we say that X follows the distribution F or X has the 
distribution F. We have 


Proposition 1.21.- Let F be the cumulative function of X. Then: 

i) 0 < F{x) < 1, V x G R; 

ii) F is monotonically increasing: a < b =f> F (a) < F (6); 

iii) P(X >x) = l- F{x ) and P(a < X < b) = F{b) - F{a); 

iv) lim F (x) = 1 and lim F (x) = 0; 

x — > +oo x — > — oo 

v) F is left-continous: F (x—) = F (x) ( F (x—) = lim F (x 

\ h — >• 0 + 

vi) F is a bounded variation function and its total variation is 1; 


h) 


vii P(X 


x) = F(x+) — F(x-) 


F(x+) 


lim F (x + h) ] . ■ 
h—y 0+ J 
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Proof.- 

i) Since P is a probability and F (x) = P (X < x), we have 0 < F(x) < 

1. 

ii) Let a, b € M such that a <b. Then: 

F(b) = P(X G ( — oo, b )) = P (X € ( — oo, a) U [a, 6)) , 

so that: 

F(b) = P{X £ ( — oo, a)) |P(l£ [a, 6)) = F (a) + P(l£ [a, 6)) 

" v ' ' V ' 

F(a) > 0 

and we have F (b) > F (a). 

iii) Since, 

F(b) =F(a) + P(X € [a, 6)), 
we have: 

P(X€[a, b)) = F(b)-F(a). 

In addition, 

P (X € (— oo, + oo)) = 1, 

so that: 

P(X € ( — oo, x)) + P (X € [x, + oo)) = 1 

^ V ' 

F(x) 

we have: 

P(X > x) = P (X g [x, + oo)) = 1 — F (x) . 

iv) Since F is increasing and 0 < F(x) < 1, V x € M, there exists a G M 
such that: 

V n € N : 0 < lim F (x) = a < 1. 

x — > +oo 
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Let us assume that a < l: we consider a n = F (n) = P (X < n). Then: 

a n — > a when n — > +oo and V n € N : 0 < a n < 1. 

Let b n = F (n + 1) — F (n) = P{n < X < n + 1). Then: 

V ji E N : 0 < < 1 

and 

+oo +oo 

V n G N : 1 - a n = P (X > n) = ^ P(n < X < n + 1) = ^ h. 

i = n i = n 

In this case, 

+oo 

V ti € N : 0 < 1 - a<l — a n = b t . 

i — n 

This inequality yields that ^ >l = +°°> since all the terms of the series 

arc non-negative and the complements of its partial sums do not converge to 
zero. Thus: 

+oo 

l - a o = ^^bi = +oo, 

iM=0 

what is a contradiction, since 0 < ao < 1. So, a = 1. 

Let us consider Y = —X. Let G be the cumulative function of Y : as 
previously established: 

lim G(y) = 1. 

y — > +oo 

or: P (Y < y) = P (X > —y) < P (X > —y) = 1 — F (— y), so that 
F(-y) = l-G(y). 

Thus, 

lim F (; x ) = lim F (— y) = 1 — lim G (y) = 1 — 1 = 0. 

x — > +oo y — > +oo y — > +oo 
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v) Let / (h) = F (x) — F (x — h) = P (x — h < X < x). It is enough to 
show that 

lim / (h) = 0. 
h — ► o+ v 

/ is increasing and bounded on the positive reals: if 0 < hi < I >2 then 

0 < / (hi) < f (/ 12 ) < 1. Thus, there exists a real number a > 0 such that 

a = lim / (h). 
h — ► 0+ 

Assume that a > 0: let a p = f • Then: 

a p — > a when p — > +00 and VpGN:0<a<a p <l. 

Let us consider the event: 


A n = <( uj € n : x — r < X (u) < x — 


n + 1 n + 2 

We have Aj n Aj = 0 when i / j, so that A = |J n g N A n satisfies: 


+CXD 

P(A)= Y^PtAn). 

n = 0 

or, 


A = { uj€Q:x — 1<X (oj) < x} , 
and we have: 

+OO 

F (x) — F (x — 1 ) = P (x — 1 < X < x) = 'Yj P (An) ■ 

n = 0 

In addition, let us consider p > 0 and B n = A n+p . We have yet /i, n Bj = 
0 when i / j, so that B = (J n g N B n satisfies: 

+OO +OO +OO 

P(B)=Y P (P”) = E P( A n+p) = Y P ( A ")• 

n = 0 11 = 0 n = p 
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or, 


B = <uj € Q : x < X (uj) < x 1 , 

1 n + p ~ j 

and we have P ( B ) = a p . Thus, 

+oo 

y P (A n ) = a p > a > 0. 

n = p 

This inequality yields that ^ (Ai) = +oo, since all the terms of 

the series are non-negative and the complements of its partial sums do not 
converge to zero. Thus: 

F (x) — F (x — 1) = +oo, 

which is a contradiction, since F(x) — F (,x — 1) = P (x — 1 < X < x) < 1. 
So, a = 0 and we obtain the result claimed. 

vi) is immediate. 

vii) We have F (x + h) — F (x) = P (x < X < x + h) + P (X = x ). So, 
it is enough to show that / (, h ) = P (x < X < x + h) satisfies: 

lim / (h) = 0. 
h — >• o+ 

/ is increasing and bounded on the positive reals: if 0 < hi < h -2 then 
0 < / (hi) < f ( h- 2 ) < 1. Thus, there exists a real number a > 0 such that 
a = lim / (h). 

Assume that a > 0: let a p = f • Then: 

a p — > a when p — > +00 and Vp€N:0<a<a p <l. 

Let: 


n + 2 


A, 


uj G : x -f* 


< X (uj) < x + 


n + 1 
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We have Aj n Aj = 0 when i / j, so that A = (J n g N A n satisfies: 

+OO 

P{A)= J2 P ( A n)- 

n = 0 

or, 

A = {uj £ : x < X (oj) < x + 1} , 

and we have: 

+oo 

F [x + 1) - F (x) = P (x < X < x + 1) > ^ P (An) . 

n = 0 

Thus, Y2n = o P converges and the complements of its partial sums 
satisfy: 

+oo 

P (A n ) — > 0 whenp — > +oo. 

n = p 

or, 

£ p(A " ) = r(x<x<x + ^ t ) = / (^ t ) = 

n = p v 7 \± / 

so that a p — > 0 and we have a = 0. 

The cumulative function may be used in order to define a measure of 
probability on M: for all rectangle (a, b ), 

» F ((a,b)) = F(b)-F(a). 

H j? extends to the measurable sets of M by the procedure described in the 
preceding section. When p F is defined by the density /, we say that / is the 
probability density of X. In this case, we have: 



for all J C R 


P(xe J) = p f (J) 
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Thus, 

Va,iel:F(i)-F(a) = / / F' = /, 

J a 

and the probability density is the derivative of the cumulative function. 
Let g : M — > M be a function. The mean of Y = g (X) is: 


E (Y) = E (g (X)) = [ g p F (dx). 

Jr 


Sometimes, the notations f R g P{dx) or f R g P (X € dx) are used. When 
H F is defined by the density /, we have: 

E (Y) = E (g (X)) = [ gf. 

Jr 

Two particular - cases corresponding to these expression are the mean , or 
expectation , of X, given by: 

E (X) = / x p F (dx) 

Jr 


and the moment of order p of A" (or p-moment of X), given by: 

M p (X) = E (XP) = [ x p p F {dx). 

Jr 


When p F is defined by the density /, we have: 

E (X) = [ x /; M p (X) = f X Pf. 

J M «/]R 

The variance of X is: 

V(X) = E ((X - E PO) 2 ) = E ( X 2 ) - [E pp ] 2 . 

The standard deviation o (X) of X is the square root of L (X): 


CT (X) = \/V(X). 
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The properties established for finite populations arc still valid in the general 
case of a random variable: for instance, E (aX + fiY) = aE ( X ) + fiE (Y). 
We also have: 

Proposition 1.22.- V ( X ) > 0, V X. In addition, V (X) = 0 if and only if 
X = E(X) P-a..s. . m 

Proof.- Let g(X) = (X — E (X )) 2 . Since g (X) > 0, we have f R g 
ji F (dx) > 0. 

Assume that f R g g F (dx) = 0 : let A = {u € Q : g (X (w)) >0}: we 
show that P (A) = 0. 

Let n, k > 0. Let us consider: 

A k = jo; G 0 : g (X (uj)) > ^ j 

and 

A n , k = |w <E n : n > g (X (u)) > * j . 

X ( A n j. ) is a finite reunion of intervals, so that X ( A n j.) is //^-measurable 

and P (A n:k ) = g F (X ( A n , k )). 

Or, on the one hand. 


o < fx(A n:k ) g^F(dx) = J R gn F {dx) - 


l-X(A ntk ) 


gg F {dx) 


< Jm fJdF(dx) = 0 . 


and, on the other hand, 

[ gfJr(dx) > 7 / g F {dx) = \g(X (A n k )). 

Jx(A n , k ) k Jx{A n , k ) k 

So, g{X (A n k )) = 0. In addition, we have A = (Jj, 6 jy A- and A k = 
Un e N A n,k • As a consequence, A k = \J neN A ri)k is negligible, which 
implies that A = |J fc g N A k is negligible too. ■ 
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Corollary 1.9.- E ( X 2 ) > 0, V X. In addition, E ( X 2 ) if and only if 
X = 0 P- a.s. ■ 

PROOF.- We have E (X 2 ) = V ( X ) + [E (X)] 2 . Thus, on the one hand, 
E ( X 2 ) > 0 and, on the other hand, E (X 2 ) = 0 if and only if V (X) = 0 e 
E (X) = 0. ■ 

The characteristic function of X is: 

<p(t) = E ( e itx ) . 


ip has some useful properties. For instance: 

- ip is uniformly continuous on R , p (0) = 1 and \p (t)| < l,VtEl; 

- if M p (X) < oo then p^) (derivative of order p of p) satisfies p^ }> (f) = 

i p E ( X p e itx ). Namely, (0) = i p M p (X); 

- if M p (X) < oo, V p € N and the series S (t) = e n (X) 

has a strictly positive radius of convergenge then tp(t) = S it ) ; 

-if X and Y arc independent and Z = X + Y, then it) = 

vx (t) vy (t ) ; 

-if Y = aX + b, a € R and b G M, then ip Y (t) = e ltb fx ( a ^)’ 

- if there exists t f 0 such that \<p (t)\ = 1, then there are o € 1 and h G M 
such that J = { an + b : n £ Z} verities P (X € J) = 1; 

-if / is the probability density of X then ip (t) = f R e ttx f = TF (/), 
i.e. ip is the Fourier transform of /. Reciprocally, if p G L 1 (M), then / = 

TF - l U) = hS^~ iu v- 

1 . 10 . 1 . Matlab implementation 

Analogous to the case of finite populations, Matlab may be used in order to 
evaluate statistics of random variables. 

Let us start by considering the situation where the probability density / of 
the scalar random variable X is known. In this case, we may use the 
numerical integration quad (or, for recent versions, integral) in order to 
evaluate its mean, variance or moments. For instance, by assuming that X is a 
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scalar random variable having density /, supported by (a, b), we may use the 
following code: 

Listing 1.17. Evaluation of statistics 

function [mX,vX,sdX] = s t a t s f ( f , a , b) 

% 

% evaluation of the mean 
% 

fl = @( x ) x*f(x); 
mX = numint(fl,a,b); 

% 

% evaluation of the second moment 
% 

f2 = @(x) x A 2* f ( x ) ; 
m2X = numint ( f2 , a , b) ; 

% 

% evaluation of the variance 
% 

vX = m2X - mX A 2 ; 

% 

% evaluation of the standard deviation 
% 

sdX = sqrt (vX) ; 
end 

function v = map(f,x, dimf) 

% 

% maps function f on a set of points x 
% each column of x is a point 
% 

% IN: 

% f : the function — type anonymous function 
% x : table dimx x np — type array of double 
% dimf : dimension of f (number of lines) — type integer 
% 

% OUT: 

% v : table dimf x np — type array of double 
% 

np = size(x,2); 
v = zerostdimf, np ) ; 
for i = 1 : np 

xx = x ( : , i ) ; 
v ( : , i ) = f ( xx ) ; 

end ; 
return ; 
end 



Elements of Probability Theory and Stochastic Processes 75 


function v = numint ( f , a , b) 

% 

% evaluates int e gral ( f , a , b ) 

% f is scalar 
% 

% IN: 

% f : the function — type anonymous function 
% a : lower bound of integration — type double 
% b : upper bound of integration — type double 


% OUT: 

% v : value of the integral — type array of double 
% 

integrand = @(x) map(f,x, dimf) ; 

v = quad ( integrand ,a,b); % old versions of Matlab 
% v = int e gral ( integrand , a , b ) ; % new versions of Matlab 

return ; 
end 


A second situation of interest is the one where the map X = X (u) is given 
and the density tp of u is known and supported by (a, b). In this case, we just 
need to modify the first function: 

Listing 1.18. Evaluation of statistics 

function [mX,vX,sdX] = statsphi (phi .X, a ,b) 

% 

% evaluation of the mean 
% 

fl = @(om) X(om) * phi (om) ; 
mX = numint ( f 1 , a , b ) ; 

% 

% evaluation of the second moment 
% 

f2 = @(om) X(om) A 2* phi (om) ; 
m2X = numint ( f2 , a , b ) ; 

% 

% evaluation of the variance 
% 

vX = m2X - mX A 2 ; 

% 

% evaluation of the standard deviation 
% 

sdX = sqrt (vX) ; 
end 
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Finally, in the situation where only a sample of X is available, the statistics 
arc obtained by considering the sample as a data vector of a finite population 
(see section 1.3). 


1 . 10 . 2 . Couples of random variables 

Couples of random variables generalize couples of numerical 
characteristics. 

Let us consider X, Y : — > M. The cumulative function of ( X , Y) is: 

F (x,y) = P (X < x, Y < y) . 

This function has the following properties: 

-F > 0; 

- F (— oo, y) = 0, V y € R; 

- F (x, — oo) = 0, V x € M; 

- F (Too, T oo) = 1; 

- If x'i < x 2 and y\ < y 2 then 0 < F (x 2 , y 2 ) ~ F {x\, y 2 ) - F (x 2 , yf) T 
F{xi, 2 / 1 ) < 1. 


The distribution of the single variable X is called marginal distribution of 
X. It is given by Fx (x) = P (X < x,Y < + oo) = F (x, T oo). In an 
analogous way, the distribution of Y is called marginal distribution of Y. It is 
given by Fy (x) = F (Too, y). 

Analogous to the one-dimensional situation, the cumulative function F may 
be used in order to define a measure on M 2 : let us consider a rectangle: 

11 = {(x, y) &M 2 : xi < x < x 2 and yi < y < y 2 } . 

We set 

p F (11) = F (x 2 , y 2 ) - F (xi, y 2 ) - F (x 2 , yi) T F (xi, yi) . 

Up is extended to the measurable sets of M 2 by the same way as in the 
case of a single random variable (procedure previously exposed). Similarly to 
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the preceding situation, we say that / is the probability density of the couple 
( X , Y) when p F is defined by the density /. In this case, 

P ((. X , Y) G ./) = fi F ( J) = J f for all J C M 2 , 

and the probability density of X is fx (x) = j R f ( x , y) dy, while the density 
of probability of Y is fy ( y ) = J R f ( x , y) dx. 

Let p : M — > M a function. The expectation of Z = g ( X , Y) is: 

E(Z) = E ( g (. X , y)) = [ g dy F . 

Jm . 2 

When p F is defined by the density /, we have: 

E(Z) = E(g(X,Y))= [ gf. 

Jm 2 

Two particular' cases corresponding to this general expression are, on the 
one hand, the mean or expectation of X and Y, given by: 

E(X)= [ xdy F ;E(Y)= [ y dp F 

Jm 2 Jm 2 

and the moments of order p (or p-moments) of X and Y, given by: 

M p (X) = E ( X p ) = [ x p dp. F , M p {Y) = E ( Y p ) = [ y p dg, F . 

Jm 2 Jm 2 

When p F is defined by the density /, we have: 

E(X) = f R 2 xf-, M p (X) = f R2 x p /; 

E(Y) = f R 2 yf ; M p (Y) = f R2 y p f. 

The variance of X is: 

V (X) = E ((X - E (X)) 2 ) = E (X 2 ) - [E {X )} 2 
and variance of Y is: 

V (Y) = E ((Y - E (Y)) 2 ) = E (Y 2 ) - [E (Y)] 2 . 
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The covariance of X and Y is: 

Cov (X, Y) = E {(X - E (X)) ( Y-E (Y))) = E (XY) - E (X) E (Y) . 
The properties established for finite populations remain valid: for instance, 
V (aX + p Y) = a 2 V (X) + ft 2 V (Y) + 2 afiCov (X, Y) . 


and 


| Cov (X, Y)\ < y/V (X)y/V (Y). 

The Matlab implementation is an extension of the situation for scalar 
variables. If the probability density / of the pair ( X , Y) is known and 
supported by (a, 6) x (c,d), we may use ‘quad2d’ (or, for recent versions, 
integral2): 

Listing 1.19. Evaluation of statistics 

function [mX,mY, C] = s t a t s f p air ( f , a , b , c , d) 

% 

% evaluation of the mean of X 
% 

fl = @(x,y) x*f(x,y); 
mX = numint2d (fl , a , b , c , d) ; 

% 

% evaluation of the mean of Y 

% 

f 1 = @(x , y ) y*f (x ,y ) ; 
mY = numint2d ( fl , a , b , c , d) ; 

% 

% evaluation of the covariance matrix 
% 

C = zeros (2,2); 

% 

f2 = @(x,y) (x^nX) A 2* f (x , y ) ; 

C ( 1 , 1 ) = numint2d ( f2 , a , b , c , d ) ; 

% 

f2 = @(x,y) (y^nY) A 2* f (x , y ) ; 

C(2,2) = numint2d ( f2 , a , b , c , d ) ; 

% 

f2 = @(x,y) (xhbX) *(y^nY) * f (x , y ) ; 

C(l,2) = numint2d ( f2 , a , b , c , d) ; 

C(2 , 1 ) = C(1 ,2) ; 

end 



Elements of Probability Theory and Stochastic Processes 79 


function v = map2d(f,x,y) 

% 

% maps scalar function f on a set of points (x,y) 

% 

% IN: 

% f : the function — type anonymous function 
% x : table nx x ny — type array of double 
% y : table nx x ny — type array of double 
% ’ 

% OUT: 

% v : table dimf x np — type array of double 
% 

nx = size(x,l); 
ny = size (x ,2) ; 
v = zer os ( nx , ny ) ; 
for i = 1 : nx 

for j = 1 : ny 

xx = x ( i , j ) ; 

yy = y ( i . j ) ; 

v ( i , j ) = f ( xx , yy ) ; 

end ; 

end ; 
return ; 
end 

function v = numint2d(f,a,b,c,d) 

% 

% evaluates inte gral (f , a , b ) 

% f is scalar 
% 

% IN: 

% f : the function — type anonymous function 
% a,b,c,d : bounds of integration — type double 
% 

% OUT: 

% v : value of the integral — type array of double 
% 

integrand = @(x,y) map2d( f , x , y ) ; 

v = quad2d ( integrand , a , b , c , d ) ; % old versions of Matlab 
% v = in t e g ra 1 2 ( inte grand , a , b , c , d ) ; % new versions of Matlab 

return ; 
end 
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In an analogous way, we may consider the situation where X = X (u) and 
Y = Y (cu). For instance, to is one-dimensional and its density is ip, supported 
by (a, 6), we may use the code: 

Listing 1.20. Evaluation of statistics 

function [mX,mY,C] = statsphipair (phi ,X,Y, a ,b) 

% 

% evaluation of the mean of X 
% 

fl = @(om) X(om) *phi (om) ; 
mX = numint ( f 1 , a , b ) ; 

% 

% evaluation of the mean of Y 
% 

fl = @(om) Y(om) *phi (om) ; 
mY = numint ( fl , a , b ) ; 

% 

% evaluation of the covariance matrix 
% 

C = zeros (2 ,2) ; 

% 

f2 = @(om) (X(om)^nX) A 2* phi (om) ; 

C(l,l) = numint ( f2 , a , b ) ; 

% 

f2 = @(om) (Y(om)^nY) A 2* phi (om) ; 

C(2 ,2) = numint ( f2 , a , b ) ; 

% 

f2 = @(om) (X(om)-mX) * ( Y(om)-mY) * phi (om) ; 

C(l,2) = numint ( f2 , a , b ) ; 

C(2 ,1) = C(1 ,2) ; 

end 


The extension to the situation where u is obtained by modifying this code 
in order to perform multidimensional integration and will not be stated here. 


1.11. Hilbertian properties of random variables 

In the same way as numerical characteristics, random variables also 
possess a Hilbertian structure. The main difference between finite populations 
and the general situation actually under study is the use of a probability 
measure P, defined on Q, which is the fundamental stone in the construction: 
P defines null sets, almost sure and negligible events. P is the basis of the 
whole construction, but it remains an abstract probability, which intervenes 
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only in the initial definitions and not in the practical calculations or in the 
most advanced results of the theory under construction. The fundamental rule 
of P is the definition of the equivalence between two random variables - 
analogous to the section 1.7), we have: 

TkT A = {u € 17 : X (w) / Y (w)} is P-negligible. 

Let us consider the set formed by the random variables on ft: 

C(ft) = {X : ft — y M}. 

and the linear subspace formed by the simple random variables on ft: 

V (ft) = {X € C (ft) : X (ft) is finite} . 

The relation « is a relation of equivalence on C (ft): 

=^Y & X ;X &Y mdY & Z => X « Z. 

The class of equivalence of X is: 

X = {Y € C (ft) : Y ~ X} 

Let us denote by C (ft, P), the set of all the classes of equivalence: 

C(ft, P) = { I:IeC(ft)} 

and by V (ft, P), the set of the classes of equivalence of the simple random 
variables: 

V (ft, P) = {x e C (ft, P) : X G V (ft)} 

For 1,7 £ V (ft, P), we consider: 


X,Y) = 


E (XT) . 


[ 1 . 18 ] 
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Then: 

Proposition 1.23.- (•, •) is a scalar product on V (12, P ). ■ 

PROOF.- The definition corresponds to a bilinear symmetric form. In 
addition, [x, x'j = E (X 2 ) > 0 for all I € V (12, P). Finally, if 

[x, X^j = E (X 2 ) = 0, then X = 0 and the form is a positive definite one. 

■ 

Let L 2 (12, P ) be the completion of V (12, P ) for the scalar product defined 
in equation [1.3]: L 2 (12, P) is a Hilbert space for the scalar product [1.18]. 
The norm of an element X € L 2 (12, P ) is: 

||*|| = y/E (X 2 ). [1.19] 

If we are interested in vectors of random variables, the elements above may 
be extended by considering product spaces. For instance, if we are interested 
in vectors X = (X \ , . . . , Xf), we consider [V (12)] p and 

k 

(X, Y) = E (X.Y) = E (XiYi). [1.20] 


In this case, 



[ 1 . 21 ] 


The completion of [V (12, P)] p is [L 2 (12, P)] p . In a manner analogous to 
that of a finite population, all the approaches below extend to vectors of 
numerical characteristics by this way. 


1 . 11 . 1 . Approximation 

As previously observed, the Hilbertian structure of L 2 (12) makes the 
application of techniques and results issued from the theory of Hilbert spaces 
possible, namely those connected to orthogonal projections. Here also, the 
results established for finite populations extend to the general situation. 
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1.11.1.1. Conditional probability and approximation 

One of the remarkable properties connected to orthogonal projections is 
the notion of conditional probability. Let us consider two events A, B C Cl. 
As previously observed, an event may be represented by its characteristic 
function : 1 a and 1 b represent A and B, respectively. We may consider the 
approximation of 1b in terms of I 4 : for instance, we may look for the 
coefficient A 6 R such that AIa is the closest possible to 1 b- Thus, we may 
look for the orthogonal projection of 1 /? onto a linear subspace having 
dimension 1 and given by: 

S = {Z £ L 2 (f l, P) : Z is constant: Z (u>) = sl^ (cu) € M, Vw € 0} 

Analogous to the result for finite populations, we have A = P(B\A), 
where 

P(B\A) = P ( An B ^ \fp(A) f 0; P(B\A) = 0 otherwise. [1.22] 
P (A) 

Symmetrically, we have: 

P (A\B) = — - if P (B) 7 ^ 0; P (A \ B) = 0 otherwise. 

Here also, we have: 

P (An B) = P (B \ A) .P (A) = P(A\B).P ( B ) . [1.23] 

As for finite populations, A and B arc said to be independent if and only if 
P (A\B) = P (A) or P (B\A) = P ( B ). If A and B are independent, then 
P (AnB) = P (A) P(B). 

1.11.1.2. Approximation of a random variable by a constant 

When considering the best approximation of a random variable X by a 
constant, we may use the same approach as for finite populations and look for 
the orthogonal projection of X onto a linear subspace having dimension 1 and 
given by: 


S = {Z € L 2 (fl, P) : Z is constant: Z (t 0 ) = s € M, Vw € H} 
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The solution is - as in the finite population case - PX = E ( X ) and the 
error in the approximation is ||X — PX || = sjv (X). 

As in the finite population case, when considering vectors of random 
variables, the best approximation of X = (X\, . . . , Xf) by a constant vector 
is m = E (X) = (mi, . . . ,rrik), where mi = E(Xi). The error in the 

approximation is 

As previously observed, Matlab implementation may be performed (see 
section 1.10.1) 


IX — m|| = 


E K 

i = 1 


E (X, - m.j 


1.11.1.3. Approximation of a random variable by an affine function of another 
random variable 

In the situation where we look for the best approximation of the random 
variable Y by an affine function of X, the approach is also entirely analogous 
to the approach introduced for finite populations: we look for the orthogonal 
projection of Y onto the linear subspace having dimension 2, given by: 

S = {s e L 2 (Q, P) : s = aX + fa a, p eR} , 


The solution is PX = aX + b , where: 
Cov (X, Y) 


V(X) 


b = E (Y) — aE (X) . 


we have also: 


PX -aX- b\\ 


\Jv(Y) (l ~[p(X,Y)} 2 ), 


where: 


P{X,Y) 


Cov (X, Y) 
V(X)V(Y) 


is the linear correlation coefficient between X and Y . We have \p(X,Y)\ < 1 
and the error is null if and only if |p (X. Y)\ = 1. 


Analogous to the finite population case, when considering vectors of 
random variables, the best approximation of Y = (Y \ , . . . , Yj.) by an affine 
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function of X = (X \ , . . . , X m ) is Y = AX + B, where A is determined by 
solving, for i = 1, . . . , k, 

C.a = b, 

with a = (ai, . . . jdmY € !M(m, 1) such that a,j = Aij, C G 

such that C r j = Cov (Xj, X r ) and b = (b\, . . . , bmf € X[(m, 1) such that 

b r = Cov ( Y , , X r ). After the determination of A, B is given by: 

B = Y — A.E (X) . 

In the scalar situation (both X and Y are real-valued), Matlab 
implementation may be performed by using the programs introduced in 
section 1.10.2. 

1.11.1.4. Approximation of a random variable by a non-par ametric family of 
functions of another one 

When we are interested in the best approximation of Y by a family of 
non-parametric functions of X, we may also use the same approach as for 
finite populations and look for the orthogonal projection of Y onto the linear 
subspace given by: 

s = {s € L 2 (a P) : s = tp (X ) ; ip : R — > M} . 

In this case, we have: 

(Y — g (A) , ip (X)) = 0, \hp : R — > M, 


i.e., 


(y ~9 (x)) ( x ) dp F = 0 ,fip: 


where p F is the cumulative function of the pair ( X , Y). When /t F is defined 
by a density /, we have: 


(y~9 (x)) y (x) f (x, y) = 0, VV : M — 


[1.24] 
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so that: 


yf (x, y) <p (x) 


9 ( x ) / (x, y) tp (x) , \/(p : M — > M, 


and, : 


Ir { fm yf ( x > y ) d£ (v ) } <p ( x ) d£ ( * ) = 

Jr { [Jr / (x, y) dt (y)] g (x) } V (x) d£ (x) . 


Thus, 


yf (x, y ) di (y) = 


f (x, y) di (y) 


g(x) 


By assuming that f R f ( x , y) di (y) / 0 for all x, the solution is: 

g(x)= [ yf(y\X = x) di (y) 

Jr 


where: 

f(y\X = x) = f(x,y)/ [ f (x,y)di(y) = f (x, y) / f x (x) . 

Jr 

The function g thus defined is called conditional mean of Y with respect 
to X or conditional expectation of Y with respect to X, and is denoted by 
E(Y | X) (i .e.g{x) = E (Y \ X = x)). 

f (y | X = x) is the conditional distribution of Y with respect to X. 
Sometimes, the expression distribution of Y conditional to X is also used. In 
an analogous way, the probability density of X conditioned to Y is: 


/ (x | Y = y) = f (x, y) / [ f (x,y) di (x) = f (x, y) / fy (y) . 
Jr 


and we have: 

E(X | Y = y) = [ xf{x\Y = y)di{x). 

Jr 
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The random variables forming the couple are independent if and only if 

V x, y : f (x \ Y = y) = f x (x) or / (y \ X = x) = f Y (y) , 


V x, y : f(x,y) = f x (x) fy (y) . 

Proposition 1.24.- Let X and Y be two independent random variables such 
that the density of the pair (X, Y) is /. Then, for g, h : M — >■ M: 

E(g(X)h(Y)) = E(g(X))E(h(Y)). 

Proof.- We have / ( x , y) = fx ( x ) fy (y), so that: 

E(g(X)h(Y)) = f g (x) h (y) f (x, y) = [ g (x) h (y) f x (x) fy (y) , 
Jr 2 Jr 2 


and Fubini’s theorem shows that: 


E(g(X)h(Y )) 


h ( y ) fy (■ y ) dl (y) 


9 ( x ) f x (x) di (x) , 


E (g (X) h (Y)) = [/ R h (y) fy (y) dt (y)] [/ R y (x) f x (x) dl (x)] 

= E(g(X))E(h(Y)).m 

Corollary 1.10.- Let X and Y be two independent random variables such 
that the pair probability density (X, Y) is / and the characteristic function of 
the pair is ip. Then: 

i) E (IT) = E(X)E (Y) and Cov (X, Y) = 0; 

ii) E (Y | X = x) = E(Y) and E (X \ Y = y) = E (X) for all (x, y); 

iii) (t) = Vx (h) Vy (h), V t = (ty t 2 ) € M 2 . ■ 


PROOF.- The result is immediate. ■ 
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Proposition 1.25.- Let X and Y be two random variables such that the 
density of the pair (X, Y ) is / and the pair characteristic function is <p. Then 
X and Y are independent if and only if p ( t ) = <p x (ti) <fiy fa ), V t = 
fa, t-2) € M 2 . 

PROOF.- =t : this implication follows from the preceding result. 

<t= : We have: 

Vxfa)Vvfa) = [ e %t2y fy (y) dt (y) [ e ltix f x (x) di (x) , 

_«/ M. _ _«/ M. 

so that: 

<Pxfa)Vyfa) = [ \ [ e* t2V fy {y) dl {y) \ e ltlX f x (x) dl (x) 

J k l Ur J J 

and Fubini’s theorem shows that: 

Vx fa) VY fa) = [ e^ x+t ^f x (x) fy (; y ) . 

JR 2 

Since <^_ Y (ii) y> y (£ 2 ) = (t), V f € M 2 , we have: 

[ e i(t lX+ t 2 y)f x{x) fy {y)= f e *fr*+t*v)f( X ,y), 

Jr 2 Jr 2 

so that the injectivity of the Fourier Transform implies that: 

/ (x, y) = fx (x) fy ( y ) on M 2 . ■ 

1.12. Sequences of random variables 

1.12.1. Quadratic mean convergence 

The Hilbertian structure of the set L 2 (Q, P) induces a first notion of 
convergence, connected to the norm and the scalar product in this space: we 
say that the sequence { U n } n e N converges to U in quadratic mean if and 
only if 

Too. 


| U n - U\\ = Je Jfan - Uf) — > 0 when n 
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This convergence is denoted by U n — > U q.mn.. It is the equivalent for 
random variables of the strong convergence in the space of the square 
summable functions {Li 2 ) and has analogous properties. For instance: 

U n — > U q.m. =}► \\U n \\ = V E (U 2 ) — > \\U\\ = sj E (U 2 ); 

U n — >• U q.m. =► £ (C/2) _> £ (c/2) and E (U n ) — > E (U) . 

1 . 12 . 2 . Convergence in the mean 

A second notion of strong convergence is the following: we say that the 
sequence { U n } n g N converges to U in the mean if and only if: 

E (| U n — U\) — > 0 when n — > +oo. 

This convergence is denoted as U n — > U m. It is the equivalent for 
random variables of the strong convergence of summable functions (space 
L 1 ) and has analogous properties. For instance: 

U n } U m. ^E (i U n ) ~^E(U). 


1 . 12 . 3 . Convergence in probability 

We say that the sequence { U n } n g N converges to U in probability if and 
only if: 

Ve > 0 : P(\U n - U\ >e ) — ^0 

This convergence is denoted by U n — > U p. It is the equivalent for 
random variables of the convergence in measure usually introduced when 
considering measure spaces and has analogous properties. We observe that 
this convergence may be interpreted in terms of events, let: 

E n (e) = ” \U n - U\ > e” = {u € fi : \U n (w) - U (w)| > s} . 

Then, 

U n — > U p. V e > 0 : P (E n (e)) — > 0 when n — > +oo. 
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1 . 12 . 4 . Almost sure convergence 

We say that the sequence { U n } n e N converges to U almost surely if and 
only if: 

P(C/ n — >U) = 1. 

This convergence is denoted as U n — > U a.s. It is the equivalent for 
random variables of the almost sure convergence usually introduced when 
considering measure spaces and has analogous properties. This convergence 
has an interpretation in terms of events, let: 

A = ”U n — > U” = {a; € Q : X n (w) — > X (w)} . 

Then, 

U n — > U a.s. <=► P{A) = 1. 
i.e. E is almost sure. 


1 . 12 . 5 . Convergence in distribution 

Let Ci (M) = {if : M — > M : ip is continuous and bounded}. We say that 
the sequence { U n } r) e f I converges to U in distribution if and only if: 

V ip € Ci (R) : E (99 (U n )) — > E (ip (U)) when n — > +00. 

This convergence is denoted as U n — > U I). It is the equivalent for 
random variables of a particular type of weak convergence (i.e. convergence 
in the dual of a particular space). We have the following theorem: 

Theorem 1 . 1 .- Let F n be the cumulative function of U n and F the 
cumulative function of U. Then, 

U n — > U L. F n ( x ) — > F ( x ) when n — > +00 

at any x where F is continuous. ■ 
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We also have the following theorem: 

Theorem 1.2.- Let F n be the cumulative function of U n and F the 
cumulative function of U. Then U n — > U L if and only if both the two 
following conditions arc satisfied: 

i) < 1 R , F n > — k 1 R , F > when n — > +oo; 

ii) Vip € C S (M) :< ip, F n > — X ip, F > when n — > +oo ( (7 S (M) = {p : 

M — > M is continuous and has a bounded support, 1 r is a constant function 
taking the value 1 at any point). 

Theorem 1.3 (Levy’s theorem).- Let p n be the characteristic function of U n 
and tp the characteristic function of U. U n — > U L if and only if ip n it) — > 
p ( t ) a.e. on M. ■ 


1.12.6. Relations among different types of convergence 

The connections between the convergences arc the following: 

Theorem 1.4.- 

1) U n — > U q.m. =f> U n — > U m. 

2) U n — >■ U m. ==► U n — > U p. 

3) U n — > U p. => U n ( k \ — > U a.s. for a subsequence. 

4) U n — > U m. =f> U n (U) — > U a.s. for a subsequence. 

5) U n — > U a.s. =f> U n — > U p. 

6) U n — > U p. =$■ U n — > U D. m 

1.13. Some usual distributions 

The reader may find in the literature the most used probability distributions. 
Here, we focus on the probability distributions which arc used in the following. 
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1 . 13 . 1 . Poisson distribution 

Definition 1.21 (Poisson distribution).- Let A G Rbe such that A > 0. We 
say that a random variable X has the Poisson distribution tP ( A) or that X is 
Poissonian of parameter A if and only if, on the one hand, X takes the values 
0, 1, ... (i.e. its image is N) and, on the other hand, 

A n 

V n e N : P(X = n) = —e~ x . 

n\ 

In this case, E (X) = V (X) = A. The characteristic function of X is 
= exp (A (exp (it) — 1)). 


1 . 13 . 2 . Uniform distribution 

Definition 1.22 (uniform distribution).- Let a, b € M be such that a < b. 
We say that a random variable X has a uniform distribution U (a, b) or that X 
is uniformly distributed on (a, b) if and only if its probability density is: 

f(x) = / 6^’ if xe(a,b) 

\ 0, otherwise. 

We have E(X) = (a + b) /2 and V {X) = (b — a) 2 /12. The characteristic 
function associated to U (a, b ) is: 

(+\ — exp(iW)— exp(iat) 2 r sin((b-«)t) 1 / • (b+a\ _ 

p w - i(b—a)t ( b-a ) [ t \ eX P (H 2 1 Z ) ' 9 

In an abbreviated way, we say simply that X is U (a, b). 


1 . 13 . 3 . Normal or Gaussian distribution 

Definition 1.23 (Normal or Gaussian distribution).- Let m € M, a G M be 
such that <7 > 0. We say that a random variable X is normally distributed 
N (rn, a) or X is Gaussian N (m, a) if and only if its characteristic function 
is: 


< p ( t ) = exp 
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In an abbreviated way, we say simply that X is N (m, a). 

N (m,a) admits moments of an arbitrary order and we have E(X) = m , 
V(X) = a 2 . 

For cr = 0, the Gaussian distribution is degenerated', its probability density 
/ is a Dirac measure f (x) = 5 (x — m), so that P (X = m) = 1 and the 
value m is a.s., while any other value is negligible. 

For a > 0, the probability density of N (rn. a) is: 


/ (®) = 




exp 


The interest of Gaussian distributions is connected to the central limit 
theorem', the empirical distribution of the sequence of partial sums of 
independent and identically distributed random variables is approximately 
Gaussian. 


When m = 0 and a = 1, we say that the distribution is a reduced one, 
i.e. X is a standard Gaussian variable or simply that X is standard Gaussian. 
The standard Gaussian distribution is denoted as N (0, 1) (for instance, we say 
that X has the distribution N (0, 1) or that X is N (0, 1)) and its probability 
density is: 

/W = _Lex P (-G) 

and its characteristic function is: 

ip (t) = exp 

We have E(X) = 0 and V (X) = 1 for the standard distribution. The 
moments of the standard distribution arc given by: 

(2n)\ 

M p ( X ) = 0, if p = 2n + 1 (odd); M p ( X ) = ^ , if p = 2n (even). 
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1 . 13 . 4 . Gaussian vectors 

Definition 1.24 (Gaussian random vectors).- Let X = (X \, ..., X k ) a 
random vector. We say that X is a Gaussian random vector (or simply 
Gaussian vector) having distribution N (m, C) if and only if its 
characteristic function is: 

(t) = r* (*. m )k - \ (*. . 

where m* = E (Xi) and C = (G l j) x <i j <k is the covariance matrix of X, 
defined as: 

Cij = Cov (X it Xj) = E (XiXj) - E (Xi) E (Xj) . ■ 

The mean of X is E (X) = m = (mi, ..., m k ). We have V (Xi) = Ca. 

Gaussian vectors may also be characterized as follows: 

Proposition 1.26.- X = (X\, ..., X k ) is a Gaussian vector if and only if all 
lineal - combination of its components is Gaussian, i.e., 

V t £ : (t, X) fc is Gaussian. ■ 

Proof.- Let t € Z (t) = (t, X) fe . The characteristic function of Z (t) is: 

/ k \ 

ipz ( u ) = ex P (iuZ) = exp iu = <p x (ut) . 

Let C = (Cij) 1<i j <k be the covariance matrix of X: 
Cij = Cov(Xi,Xj). => : Assume that X is Gaussian and has the 
distribution N (m, C). Then: 

<Px ( ut ) = exp (iu ( t , m) k - ^ u 2 (t, Ct) fc 

so that, 

Vz ( u ) = exp (im z u - u 2 
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where mz € M, <Jz € M such that oz > 0 and 

k 

m z = (t, m) fe , cr| = (t, Ct) fc = ^2 CijUtj > 0, 

i,3 = 1 

since C is positive. Thus Z (t) is Gaussian of distribution N (m z , az). 
Assume that Z (t) is Gaussian for any t € R fc . Then: 

ip z (u) = exp (im z u - ^cr|u 2 

Taking t = e,;, where {ei, e^} is a canonical basis of M fc , we have 
Z (e*) = X t , so that each Xi is Gaussian. Let N (m*, tr*) be the distribution 
of X{ . Then, on the one hand, 

/ k \ k k 

m z = E(Z (t)) = E ( UXi | = Y tiE (-**) = X] tim = m )fc i 

Vi s= 1 / i = 1 i = 1 

and, on the other hand, 

k 

a 2 z = V(Z (t)) = ]T CijUtj = (t, Ct) k , 
i,j = 1 

since: 

( k \ k 

Y UtjXiXj = Y t i t J E ( x i x j ) 
id = 1 / i d = 1 

and 

' / k \ 1 2 / k \ 2 k 

E Y U X i ) = X^ timi ) = X] = 

k 

Y titjE (Xi) E (Xj) , 

id = 1 
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so that: 


ip z (u) = exp ( i (t, m) fc u - - (t, Ct) fc u‘ 


Consequently, 


¥>x 00 = Vz (!) = exp (i (t, m) fc - ^ (t, Ct) fc 

and X is Gaussian and has the distribution N (m, C). ■ 

We may generate a Gaussian vector by considering independent Gaussian 
variables: 

Proposition 1.27.- If Xi ~ N ( ) and the variables X\, ..., X^ are 
independent, then X = (X i , ..., X/,.) is a Gaussian vector and its covariance 
matrix C verifies: 

Cij = 0, se i / j; C lt = of = V ( X { ) . ■ 

This result is a consequence of the following Lemma: 

Lemma 1.5.- Let X = (X\, ..., Xf.) be a vector formed of independent 
Gaussian vai'iables (Xi ~ N (rrii,<Ti)). Then, for any 

(a 0 , “l. a n) G M n+1 : 


n 

0 + X a,X l is Gaussian and has the distribution^ (m, a ) ; 

7 = 1 

n n 

m = a o + ^2 otirrii and a 2 = of of . ■ 

7=1 7=1 


n 

PROOF.- Let Y = ao + ajX^ The chai'acteristic function of L is: 

i = i 


(pit) = E (exp ( itY )) = exp (fopf) E exp 


if CYjX, 
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Given that the variables arc independent: 

n n 

ip (t) = exp (■ iaot ) ]^[ E (exp ( iaptXi )) = exp (iaot) n <Pi (oat ) , 

i= 1 i = 1 

where is the characteristic function of Xj. Thus, 

( 2 \ n n 

imt —t 2 | ; m = op + crjraj and cr 2 = afcr 2 

2 / i=t i=t 


and V is Gaussian. ■ 

Proof.- of the proposition: It is an immediate consequence of the 
preceding lemma. ■ 

We also have: 

Proposition 1.28.- Let X = (Xi,X 2 ) be a random vector such that 
Xi ~ N (mi, <Ji). Then, the variables X\ and X-2 arc independent if and only 
if the characteristic function of X is tp ( t ) = 

exp (i (t\mi + t 2 m 2 ) — \ (erf if + ■ 

PROOF.- X\ and X 2 arc independent if and only if ip(t) = 
ip x i (ti ) tpx 2 fa), this establishes the result. ■ 

Corollary 1.11.- Let X = (Xi, ..., Xk) be a random vector such that 
Xi ~ N (mi, (Tj). Then the components of X arc independent if and only if 

/ k k \ 

the chai'acteristic function of A' is p (t) = exp i tjrn r — | af t 2 . 

Vi- I i 1 / 

■ 

PROOF.- It is an immediate consequence of the preceding result. ■ 

Corollary 1.12.- Let X = (Xi, ..., Xfc) be a Gaussian random vector. 
Then its components are independent if and only if Cov (Xi, Xj) = 0 for 

■ 

PROOF.- =f> : this implication is immediate, since the independence implies 
that Cov (Xi, Xj) = 0 (Cf. preceding results). <f= : Let ip be the 
characteristic function of X and Xj ~ N (mi, a,). If Cov (Xj, Xj) = 0 for 
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i / j, we have ip (t) 
Gaussian random vector. 


exp 



so that X is a 


Corollary 1.13.- Let X = (X\, ..., X k ) be a random Gaussian vector and 
Y a random Gaussian variable such that Cov (Y, Xj) = 0, 1 < i < k. Then Y 
is independent of X and Z = (Y, X\, ..., X k ) is a Gaussian random vector. ■ 


PROOF.- Let us denote by itla the mean of A and by Ca the covariance 
matrix of A. The preceding result shows that Y is independent of X. 
Consequently, for r = (to, t) € M x = M fc+1 : 


E (exp (i ( t au, Z) fc+1 )) 


E (exp (■ UqY )) E 




so that: 

Vz (t) = Yy (to) Vx (t) 


and 

Vz ( T ) = ex P (it om Y - \ a Y t2 + i (t, mx) t - ^ (t, C A -t) fc 
i.e., 

ip z (t) = exp fi (t, m z ) k+1 - ^ (r, C z r) k+1 

Thus, Z is a Gaussian vector. ■ 

When the matrix C is invertible, the probability density of X is: 

f ( x ) = n n — exp | - (x — m, C _1 (x — m)) | 

(27r) fc/2 |det(C)| 1/2 V 2 1 V » ) 
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1.14. Samples of random variables 

Let us consider a random variable X € L 2 (<T P) having the distribution 
F. This variable has a finite mean m, a finite standard deviation a and a finite 
variance a 2 : 

E (X) = m, V (X) = a 2 , a (X) = a. 

In practice, the distribution of X and the values m and a arc often unknown 
and have to be determined by using a sample of X. 

Naively, a n-sample (or sample of size n) of a random variable X is a set 
of n independent values of X: (X \ , . . . , X n ), where X t = X(l <;*), for some 
t a € fh Thus, a sample of X is a set of realizations of X (or variates from 
X). These values are used in order to estimate the unknown quantities, by 
using estimators such as, for instance, extremal likelihood, extremal entropy 
or minimum square ones. The theory of the estimators is outside the scope of 
this text and will not be studied here - the reader can find numerous 
references about this point in the literature - we limit ourselves to recall that 
the distribution F of X is estimated by using the empirical distribution of the 
sample and the mean is approximated by the empirical mean of the sample. It 
is interesting to notice that an n-sample may be seen as a vector defined on a 
random finite population , what connects the general approach to the finite 
population one: the unknown quantities are usually approximated by applying 
a finite population approach to the values observed in the sample. 

There arc five key results relating the properties of the samples of a 
variable X of finite mean and finite variance: the weak law of large numbers, 
the strong law of large numbers, the central limit theorem, the law of large 
deviations and the Glivenko-Cantelli theorem. The first two state that the 
empirical mean of the sample converges to the mean of X when the size of 
the sample increases. The third one says that the averages of independent 
identically distributed random variables is asymptotically normal distributed, 
the fourth states that the probability of having a large difference between the 
empirical mean of the sample and the mean of X rapidly decreases with the 
size of the difference and the size of the sample (it gives an exponentially 
decreasing upper bound for that probability). The last one establishes that the 
empirical distribution converges to F uniformly. 
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Formally, let us consider a sequence { X n } n N c L 2 (Q, P ) formed by 
independent random variables having the same distribution F as X (i.e. the 
distribution of X n is F, V n € N. Since their distribution is same, all the 
elements of the sequence have the same mean, variance and standard deviation 
as X: 


Vn € N : E (X n ) = m, V (X n ) = a 2 , a (X n ) = a. 


Let us introduce: 

= log ( E (exp(tX))), 

h(rj) = sup {t (17 + m) — ^(rj) : t such that trj > 0} 


The empirical mean X n is defined by: 


x n = -Y j x i 

i = i 


The empirical distribution F n is defined as: 


F n = - sum% = l lf 00 ,x)(Xi), l ( _oo ,x)(s) = { 
nd v v t 

The key results arc resumed in the following theorem: 

Theorem 1 . 5 .- 

1) E (X„) = m, V (Xn) = ^ , <7 (X n ) = -4;. 


1, if s < x 

0, otherwise 


2) (weak law of large numbers) X n — > U p. 

3) (large deviations law) let us consider e > 0 p. Then: 


P (. X n — m > e) < exp ( — nh(e)), P (. X n — m < — e) 

< exp ( — nh( — e)), 

P (\X n — m > e|) < exp ( — nH(e)), H{e) = min{/i(£), h( — e)}. 


4) (strong law of large numbers) If H (e) > 0, for any e > 0, then X n 


m a.s. 


5) (central limit) Let: 
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Then Z n — > JV(0, 1) D. 

6) (Glivenko-Cantelli) F n — » F p. ■ 

1.15. Gaussian samples 

Gaussian samples have particular properties which will be exploited in the 
following. We present here the essential elements which will be used in this 
text. 

Definition 1.25 (Gaussian sample).- Let X = (X\, ..., X k ) be a random 
vector. We say that X is a sample from N (m, a) if and only if all its 
components are independent and have the same distribution N (in. a). ■ 

A sample from N (m, a) is a Gaussian vector such that m = (m, ..., m) = 
ml and C = cj 2 Id. 

Let us recall that we can associate to a linear transformation T : — » 

M /,: an adjoint transformation T* defined by 

(T (x) ,y) k = (x, T*(y)) fc> V x, y G R fc . 

T f is also lineal - : indeed, let us consider a G M, x, y, z G M fc . Then: 

(x, T* (ay + z)) fc = (T (x) , ay + z) fc = a (T (x) , y) k + (T (x) , z) fc 

and we have: 

(x, T f (ay + z)) fc = a(x, T f (y)) fc + (x, T*(z)) fc = (x, aT f (y) + T\z)) k . 

Given that x is arbitrary, we obtain T* (ay + z) = aT ( (y) + (z) and 

T /: is lineal - . 

We have (T f )* = T, since: 

(T* (x) ,y) k = (y, T* (x)) fc = (T (y) , x) fc = (x, T (y)) fc . 

The lineal - transformation T is orthogonal if and only if T 1 = T L . In this 
case, 

||T(x)|| 2 = (T(x), T(x)) fc = (x, T*(T(x)))fc = (x, x) fc = ||x|| 2 , VxGt* 
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If T is orthogonal, then T 4 is orthogonal, since (T*) f = T=(T x ) 1 = 

(iT 1 . 

If B = {bi, bfc} and £ = {ei, ..., e^} are orthogonal basis of M fc , then 
the transformation corresponding to a change of basis: 

k 

T (x) = a; a.i = ^ Xj (e*, b ,-) fc e t 
j = i 


is orthogonal and 


k 

T t (y) = fc /3 l =Y,yj^ e ih- h r- 

3 = 1 

It is enough to verify that: 

k 

(«• y)k = x i ( e *’ b i)fc ^ P)k ’ 
= 1 


so that: 

( T ( x ) , y) k = («. y )k = ( x ' P)k = ( x - Tt ( y))k ■ 

We have: 

Proposition 1.29.- Let X = {X\, ..., X^) be a sample from N (0 ,a). 
Then: 

i) If T : — > R. k is an orthogonal transformation, then T (X) is a 

sample from N (0, a); 

ii) The components of X in any orthonormal basis of form a sample 
from N (0, <r); 

iii) Let E\ © E -2 © ... © E p be a decomposition of M fc in p orthogonal linear 
subspaces and P j (X) be the orthogonal projection of X onto E t . Then, each 
Pi (X) is a sample from N (0, a) and, in addition, Pj (X) and Pj (X) are 
independent for i / j. m 
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PROOF.- Let <p be the characteristic function of N (0. a). Given that the 
components of X are independent, we have: 

<px (*) = n ^ (*<) = exp (~y 11*11*) • 

i = 1 X ' 

Let T : — » M /,: be an orthogonal transformation and Y = T (X) . We 

have: 


(t,Y) fc = (t,T(X)) fc = (T* (t),X) fc 
so that: 

<Py 00 = E ( ex P 0 (*• Y )k)) = E ( ex P (* ( T< 00 . x ) k )) 

and 

cp Y (t) = exp ^-y ||T f (t)||^ = exp ^-y ||t||^ = <p x (t) , 

so that we have (i). (ii) follows from (i): The basis change transformation is 
orthogonal. Finally, (iii) follows from (ii): let B, be an orthonormal basis of 
Ei, i = 1, ..., p. Then B = B\ U ... U B p is an orthonormal basis of 

and the components of X in the basis B form a sample from N (0, cr), so that 
we have the result. ■ 

Theorem 1.6.- Let ( Y , X\, ..., X n ) be a Gaussian vector. Let us denote X = 
(Xi, ..., X n ). Then, there exists a = (op, an, ■■■, a n ) € M n+1 such that: 

n 

E(Y | X) = a 0 + ^a i X i . 

i = 1 

Moreover, Y — E {Y | X ) is Gaussian, has mean equal to zero and is 
independent of X and E (Y | X). ■ 

PROOF.- let us consider the auxiliary linear subspace: 

n 

Z € L 2 (fi, P) : Z = ft, + Y, Pi X i’ (A). P n ) € K n+1 

i = 1 


w = 
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W is a finite-dimensional linear subspace (dim (W) = n + 1), so that the 
orthogonal projection P\y (Y) of Y onto W exists and is uniquely determined. 
In addition, it verities: 

Pw (Y) € S and E ((Y - P w (Y)) Z) = 0, V Z £ W. 

Since Pw (Y) € S, there exists a = (qq, a±, ..., a n ) € M ra+1 such that: 

n 

Pw 00 = «o + ^ OLiXi. 

i = 1 

It follows from the preceding proposition that both Pw (Y) and 
Y — Pw (Y) arc Gaussian. Since Z = 1 € W (it correspond to /3 0 = 1, 
/3 i = 0, i > 0), we have: 

E (Y — P w 00) = E((Y- P w {Y)) 1) = 0, 

so that Y — Pw (Y) has mean equal to zero. Since X r € W, we also have 
E((Y- P w (Y)) Xi) = 0, so that: 

Cov (Y - P w (Y) , Xi) = E ((y - P w (IQ) Xj\ 

= o 

- E (y - P w (Y)) E (Xi) = 0. 

= o 

It yields from this equality that Y—Pw 00 i s independent of X, (given that 
both arc Gaussian). Since i is arbitrary, Y—Pw ( Y ) is independent from X and 
(Y — Pw (Y) , X\, ..., X n ) is a Gaussian vector. We also have 

Cov (Y - P w (Y ) , P w (X)) = 

E ((Y - P w (10) Pw 00) ~ E (Y — P w ( Y))E (P w (Y)) = 0, 

' V ' ' v ' 

= 0 =o 

so that Y — Pw ( Y ) is independent from Pw (Y) (given that both arc 
Gaussian). Let: 


Z e S = {s £ L 2 (n,P) : s = 
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We have: 

E((Y-P W (Y))(P W (Y)-Z)) = 

E ((Y - P w (Y)) P w (Y)) — E ((Y — P w (Y)) Z ) , 
= o 


so that: 

E ((Y - P w (Y)) (. P w (Y) - Z)) = -E ((Y - P w (Y)) Z) 

^ — E (Y — P w (Y)) E (Z) 

independent _ q 

and, as a consequence, E ((Y — Pw (Y)) (Pw (Y) — Z)) = 0. Thus, 

E ((Y - Y) 2 ) = E ((Y — P w (Y) + P w (Y) - Y) 2 ) 
verities: 

E ((Y - Y) 2 ) = £7 ((Y - P W (Y)) 2 ) + E [[P w (Y) - Y) 2 ) > 
p((Y-P w (Y)) 2 ), 

so that: 

Piy (Y) = arg min {||s — Y|| : Y G 5} 

and Pw (Y) is the orthogonal projection of Y onto 5. Thus, 
P W (Y) = E(Y |X). ■ 

We have also 

Proposition 1.30.- Let U = (Ui, lh ) be a pair of independent random 
variables having the same distribution U (0, 1). Let: 

Xi = y /— 2 log (Ui) sin (2nU2) ; X 2 = \/— 21og (Ui) cos (27 t[/ 2 ) • 

Then X = (X\, X 2 ) is a pair of independent random variables having the 
same distribution N (0, 1). ■ 
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PROOF.- The Jacobian of the transformation w : 

iP(U 1 ,U 2 ) = (X 1 ,X 2 )is: 


such that 


J = 


det 


/ — — =L== sin (27 tC/ 2 ) 27r v / -21og(C/i) cos(2irU 2 ) 
C/i V / -21og(C7i) 

cos{2ttU 2 ) —2iry/—2 log (?7i) sin(2nU 2 ) 


\U^-2 log(CJi) 


that is, 


j = 2 JL 

U{ 

In addition. 


Xi + Xi = -2 log (Ih) =>lh= exp ( -- \\X\\ Z 2 


and 


X x 

sin (2ttU 2 ) = — and cos (27r[/ 2 ) = ^ 2 


1 


X 


U 2 = — arctan — . 


2vr 


*t 


X 2 


,\ X L 

The probability density of U is: 

ri,ifu€(0,l)x(0,l) 

I 0, otherwise. 


and the probability density of X is g (x) = / 1 (x)) / J (see [DE 92]), that 

is: 


ff( x ) 



It follows from this equality that X = (X\, X 2 ) is a pair of independent 
random variables having the distribution N (0, 1). ■ 
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1.16. Stochastic processes 

Naively, stochastic process is a family of random variables depending on a 
parameter: X = { X (A)} Ag A and defined on the same universe Q, i.e. V n G 
N : X (A) : Q — > M. A is the family of indexes and we say that the stochastic 
process is indexed by A. Thus a stochastic process may simply be considered 
as a function X : A x O — > M. 

We are mainly interested in stochastic processes indexed by time. Thus, we 
mainly consider applications X : (a, b) x — >■ M, where (a, b) C M, i.e., 

Definition 1.26 (stochastic process).- Let a, b € M be such that a < b. We 
say that A' is a stochastic process indexed by (a, b) on the probability space 
(D, P) if and only 

V t € (a, b) : u — > X (t , oS) is a random variable on fl. ■ 

Thus, for a given t, the random variable Xi (oj) = X (t, i j) has a cumulative 


distribution given by: 

P t (x) = Pr(X t < x) [1.25] 

and a density of probability p t . These quantities allow the evaluation of the 
mean of the process, given by 

/ OO 

x p t (x) dx [1.26] 

-OO 

and its variance, given by 

4W = E[(Xt - bx(t)) 2 } = E[x 2 t ] - p x (tf [1.27] 

The autocorrelation function of the process is 

Rxx(s,t) = E[X t X s \. [1.28] 

The covariance function of the process is 

C(s,t) = E[(X s -p x (s)) (X t -p x (t))} 

= E [X s Xt] — Px( s )t l x(t) [1-29] 

= Rxx(s,t) - Px(s)p x (t)- 
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We have C(t,t) = cr 2 x (t). All these definitions extend to the case of 
vectors. 


1.17. Hilbertian structure 

Stochastic processes possess Hilbertian properties analogous to those of 
random variables. For instance, we may consider the set 
V = Sp [t, (a,b), L 2 (Q, P)), i.e. the set formed by the simple functions 
defined by partitions of (a, b ) and taking their values in L 2 (fi, P). We have 


{ n - 1 

X : (a, b) X — » R : X (t,u) = ^ (w) \ tu t i+1 ) (*) 5 

i = 0 

t g qjart((a,6)) , a:,; g L 2 (n, P)} . 

We may define a scalar product on V: 

(X, Y ) = f E (XY ) . 

(a,b) 


Indeed, (•, •) is bilinear, symmetric and defined positive: when X, Y £ V, 
we have 


n x — 1 


(t) and Y ( t,u ) 


Tly — 1 

y! (*) 


i = o 


where x = (x 0 , x n J € qiart ((a, 6)) and y = (y 0 ,...,y ny ) € 

tpart((a, 6)). For 

n > 0 such that h = — < min {<5 (x) , 6 (y)} , 

n 

we have 


XY = XiYi 1 


( Oi , Oi+l) 


(0 


a* = a + (i — 1) h , 


x i = x (a*) , y* = y (ai) 
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so that 

n— 1 

(X, Y) = hJ2 E ( X ^i) ■ 

i = 0 

In particular, 

n— 1 

\\X \\ 2 = {X,X) = hY J E([X i f). 

i = o 

The completion of V for this scalar product is a Hilbert space denoted V = 

L 2 ((a, b ) , L 2 (fi, P)). 

In the following, we shall use often the extension principle , which consists 
of extending a linear continuous map defined on V to the set formed by all the 
Cauchy sequences of its elements and to the sets V such that V is dense on V. 
The extension principle is based on the following theorems: 

Theorem 1.7.- Let V be a pre-Hilbertian space of scalar product (•, •) and 
/ : V — > M a continuous linear application on V. Let 

¥ = {Tf = {v n } n g N C V : v is a Cauchy sequence for (•, •) } , 

and 

||T|| = lim ||f n || . 

n — > +oo 

Then 

i) V v £ ¥ : {I ( v n )} n g N C M is a Cauchy sequence; 

ii) There exists I (y) G M such that I ( v n ) — > I (v) when n — > +oo . 

iii) If to E ¥ verifies \\w n — r> n || — > 0 when n — +oo, then I ( w ) = 
I(v). 

iv) If a, f3 € M ; v, w € ¥, u = av + f3w, then I ( u ) = al ( v ) + j3I ( w ). 

v) Let M € M verify 1 1 (n)| < M ||n||, V v G V. Then 1 1 (57) | < M ||57|| , V 
v € ¥. ■ 
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PROOF.- Let us observe that {||ti n ||} n e N C K is a Cauchy sequence, since 
1 1 1 Vm 1 1 — 1 1 Vn HI 5; || V m V n \ \ 

and 

m,n > n (e) \\v m - v n \\ < e => |||t; m || - ||ti n ||| < e. 

Thus, there exists meR such that ||n n || — > m when n — > +oo. 

i) {I (r n )} n e N C M is a Cauchy sequence, since 

1 1 (v m ) ~ I {v n )\ = ||/ (% - ^n) || < M \\v m - V n \\ 

and {v n } n e N is a Cauchy sequence. 

ii) Given that M is complete, there exists rn £ R such that 

m = lim I (v n ) . 

n — > +oo 

The result is obtained by taking I (v) = rn. 

iii) If {w n } n g N C V verities || w n — ti n || — > 0 when n — > +oo, we have 
|I (w n ) - I (n n )| = ||J (w n - t; n ) II < M || w n - v n \\ — > 0, 

so that 

lim I (w n ) = lim I (v n ) 

n — > +oo n — > +oo 

and, as a consequence, I (w) = I (v). 

iv) We have 

I (u n ) = I (av n + (3w n ) = al (v n ) + /3I (w n ) — > al (v) + /3I (w) , 
so that I (u) = al (v) + (31 (w). 

v) Since ||t;„|| — > ||n|| when n — » +oo and |7 (ti n )| — > 1 1 (tf)|, we have 
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Theorem 1.8.- Let V be a pre-Hilbertian space for the scalar product (•, •) 
and I : V — > M a continuous linear application on V. If V is a linear space 
such that V is dense on V, then I may be extended to V, i.e. there exists 
Iy : V — > M lineal - continuous which coincides with I on V. In addition, if 
M € M verities 1 1 (n)| < M ||n||, V v € V, then | Iy (n)| < M ||n||, VuGL 


PROOF.- Let us recall that a linear map I : V — > M is continuous if and only 
if there exists M e M tsuch that \I (n)| < M ||n||, V v € V. Given that V 
is dense on V: for any v € V, there exists v = {n n } ngN C V such that 
\\v n — n|| — > 0 when n — > +oo. Let us consider 

Iy ( v ) = lim I (v n ) . 

n — > +oo 

v is a sequence of Cauchy (since it converges). Thus, Iy (v) = I (v). It yields 
from the preceding theorem that the limit exists and is well-defined: if w = 
W„ eN C V verifies \\w n - n|| — > 0 when n — > +oo, then {m n } ngN 
is a sequence of Cauchy (given that it converges) and \\w n — 'c n || — > 0 when 
n — > Too, so that I (v) = I (w) and, as a consequence, 

lim I (v n ) = lim I (w n ) . 

n — > +oo n — > +oo 

Iy : V — > M is lineal': let a, (I G M ; v, w € V ; u = av+(3w, v = {v n } n g N 
C V, w = {m n } ngN C V , \\v n — n|| — > 0 and \\w n — in|| — > 0 when 
n — > Too. Then u = {u n } n g N C V given by u n = av n T /3w n verifies 
I ( u ) = al (v) T /3I (w). Thus, Iy (w) = aly (u) T /3Iy (v) . Iy :V — > M is 
continuous : if v = {v n } n gfj cV verifies \\v n — n|| — > 0 when n — > Too, 

then ||n n || — » ||n||, so that ||n|| = ||n|| and we have 1 1 (n)| < M||n|| 

| Iy (n)| < M ||n|| . ■ 

Corollary 1.14.- Let V be a pre-hilbertian space for the scalar product 
(•, •) and I : V — > M a continuous linear map on V. If V is the completion 
of V for (•, •) , then I extends to V. m 

PROOF.- It is enough to notice that ¥ = {{n n } ngN C V : {v n } ne 
Cauchy sequence (•, •) } is dense on V. m 


is a 
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1.18. Wiener process 

One of the main stochastice processes is the following one: 

Definition 1.27 (Unidimensional Wiener process).- Let W = {W (i)} t>0 
be a stochastic process on the probability space (f l,P). We say that W is 
a unidimensional Wiener process or unidimensional brownian motion if and 
only if: 

i) W (0) = 0 ; 

ii) V (-s, t) € R 2 such that 0 < s < t : 

a) W{t)-W (s) ~ N (0, y/t^s) i 

b) W (t) — W (s) is independent of W (b) — W (a), V (a, b ) € R 2 such 
that 0<a<6<s<t.B 

Sometimes, the expression univariate Wiener process is used in order to 
refer to this stochastic process. We have 

Proposition 1.31.- Let (s, t) € R 2 be such that 0 < s < t. Then 

i) E (( W (t) - W (s)) W (s)) = 0 ; 

ii) E(w(s) 2 ^ = s ; 

iii) E (W (t) W (s)) = s ; 

iv) E {{W (t) - W (s)) W (t)) = t-s.m 

PROOF.- Taking a = 0 and b = s, we have: W (t) — W (s) is independent 
from W (s) - W (0) = W (s) - 0 = W (s), so that 

E((W(t)-W(s))W(s)) = 

E (V (t) - W (s))E (W (s)) = 0. 

^ N/ ^ 

= 0 

In an analogous way. 


E (w (s) 2 ) = E ((W ( s ) - W (0)) 2 ) =V{W(s)-W (0)) = s - 0 = a. 
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Thus, 


E (W (t) W (s)) = E ((W (t) - W (s) + W (s)) W (s)) = 
E((W(t) -W(s))W(s)\ + e(\V (s) 2 ) , 


so that E (W (t) W (s)) = s. Finally, 


E ((W (t) - W (s)) W(t)) 



E(W(t)W(s )) =t-s.m 
v ✓ 

= s 


Theorem 1.9.- Let (s, t ) € M 2 be such that 0 < s < t; k e N be such that 
k>0',t = (ti, ..., tfc) € be such that 0 < ti < s (1 < i < k) and ti < L+i 
(1 < i < k - 1); X = (W {h ) , ..., W (t k )), Y = W (t) — W (s). Then 
E(Y | X) = 0. ■ 


Proof.- Let a = 0, b = ti : 0 < a < b < s, so that W (t) — W (s) is 
independent from W (b) — W (a) = W (ti). So, Y is independent from X. As 
a consequence, E (Y \ X) = E (Y) = 0. ■ 

Definition 1.28 (Multidimensional Wiener process).- Let W = 
{W (t)} t> o be a family of random vectors M /r on the probability space 
(tl,P). We say that W is a k-dimensional Wiener process or k-dimensional 
brownian motion if and only if W (t) = (W\ (t ) , ..., Wk (' t )), where 

(i) {Wi (t)} t > 0 is an unidimensional Wiener process (1 < i < k ); 

(ii) The components of W arc mutually independent, i.e. if % / j then W t ( t ) 
is independent of Wj ( s ), V (s, t ) € M L> tal que s, t > O.H 


Sometimes, the expression multivariate Wiener process is used in order to 
refer to this stochastic process. 

In order to simulate a Wiener process, discretization of time is requested. 
The reader may find in the literature many works on this topic. Here, we 
illustrate the simulation by using a simple Euler discretization combined to 
the random number generator randn): 
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Listing 1.21. Simple simulation of a Wiener process 

function w = wiener (ndim .nstep , time step ) 

% 

% generates nstep steps of a Wiener process of dimension ndim 
% 

% IN: 

% ndim : dimension — type integer 
% nstep: number of steps — type integer 
% timestep : the time step — type double 
% 

% OUT: 

% w : ndim x ( nstep + 1) table — type double 
% w( i , : ) is the value of W( ( i —1)* time step ) 

% 

w = zeros (ndim, nstep + 1); 
aux = sqrt ( timestep ) ; 
for i = 1 : nstep 

w( i + 1 , : ) = w( i , : ) + aux*randn ( 1 , ndim ) ; 

end ; 
return ; 
end 



Figure 1.6. Simulation of a Wiener process 


1.19. Ito integrals 

We present in this section the definition of some stochastic integrals. We 
are mainly interested in 

T T 

J ip(W(t))dt and J ip (W (t)) dW (t) , 
o o 
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where tp : M — > M is a function and W = {W {t)} t > 0 is a Brownian motion. 
The definitions below are usually referred as being those “in the sense of Ito”. 
There arc concurrent definitions, such as those “in the sense of Stratanovich”, 
but they arc outside the scope of this text. 

1 . 19 . 1 . Integrals with respect to time 
In this section, we give a definition for 

b 

I(X) = J X (t) dt. 

a 

Let t = (■ to,---,t n ) € s Pact((a, b)) be a n-partition of (a, 6). Let us 
consider the finite sum 

n 

i(x, t ) = 1 ). 

i = 1 

We say that 

b 

J X (t) dt = Y 

a 

when 

b 

I ( X , t) — > J X (f) dt for 5 (t) — > 0, 

a 

i.e., 

Definition 1.29.- Let X € L 2 ((a, b ) , L 2 (O, P)). We say that 

b 

J X (t) dt = Y G L 2 (Q, P ) if and only if, for any e > 0, there exists 

a 

r] (e) > 0 such that 

V t G Watt ((a, b)) : 6 (t) < ? ? (e) =>■ \\Y - I (X, t ) || L2(a P) < e. ■ 
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Let us recall that the scalar product and the norm of L 2 (<>, P ) arc given by 

(X,Y) L2{n p) = E(XY) ; \\Z\\ LHa p) = E{Z 2 ) . 

Let n > 0 and consider 

n 7 

I n (X) = h'y X(a + (i - 1) h) ; h = - — - . 

n 

When the integral exists, we also have 

b 

I n (X) — > f X (t) dt for n — » +oo. 


The integral with respect to time is linear and continuous: 

Theorem 1.10.- let X , Y be two stochastic processes indexed by (a, b ) such 

b b 

that f X (t) dt and f Y (t) dt both exist. Let a, (3 € M. Then 


b b 

( aX (■ t ) + (3Y (t)) dt = a [ X (t) dt + (3 f Y (t) dt . 


In addition, there is a constant C = C (a, b) € M, independent from X 
such that 


Via 2 ((a, b ) , L 2 (a P)) : || / x (: t ) dt 

C \\ X \\L 2 {(a,b), L 2 (fl, P)) ■ 

Proof.- For any t = (to,...,t n ) € ^3art((a, 6)), 
/ (al + /TL, t) = a/ (X, f) + /VI (y, t ) , 


i 2 (n, p) 
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so that I (oiX + fiY) = al ( X ) + f3I ( Y ). By the extension principle, it is 
enough to consider X G £p (£, (a, b ) , L 2 (12, P)). For such an X, we have 


Tlx — 1 

X = Y X i 1 {x i ,x i+ 1) (t) , 

i = 0 

where x = (xq, ..., x nx ) G ^Jatt ((a, b)). Let n > 0, h = - — -, a* = 

n 

a + (i — 1) h and X t = X ( a* ) (1 < * < n + 1). Then, for h < 6 (x), we 
have 

Ti—l n 

X (t, U) = Y ^l(a i; a i+1 ) (t) =h 2 Y J X t ; 

i = 0 i = 1 


[In (X)f = h 2 


E*< 


= hf 


E x ‘ x i 

i,j = 1 


< X 


E 

hi = 1 


Al 


+ X 2 


Thus, 

n n 

[In PO] 2 < nh 2 Y X ? = (b-a)hY X ? 

i = 1 i = 1 

and 

n 

II In (X)f L2 ^ p) = E ([I n (A)] 2 ) <(b-a)hYx {X 2 ) = 

i = 1 

V> -“) / E(X 2 )=(6-o)l|Jr||| S ( (a , ( ,). I , 2(!! ,p) | , 

(a,b) 


that establishes the result. ■ 


1 . 19 . 2 . Integrals with respect to a process 
In this section, we give a definition for 


= / X (t) dY (t) . 


I (X, Y) 
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Let t = (to, ..., t n ) € %3avi((a,b)) be a n-partition of ( a, b ). Let us 
consider the finite sum 


I (X, Y,t) = Y J X (U- 1) (Y (U) - Y (L_r)) . 


We say that 


b 

I X ( t ) dY ( t ) = Z 

a 

when 

I (X, y, t) — > Z for h' (t) — > 0, 
i.e., 

Definition 1.30.- Let X,Y € L’ 2 ((a,b ) , L 2 (D, P)). We say that 

b 

J X ( t ) rfy (t) = Z £ L 2 (fi, P) if and only if, for any e > 0, there exists 

a 

r] (e) > 0 such that 

V* e q3art((a,6)) : 5 (t) < r] (e) =>■ ||Z-/(X, y, f)|| L 2 (aP) < e. ■ 

Let us consider n > 0 and, with h = 

n 

I n (X, Y) = J2 x ( a + (i ~ 1) h) (Y (a + ih)-Y(a + (i- 1) h)) . 

i = i 

When the integral exists, we also have 

b 

In (X, Y)—>[x ( t ) rfy (t) for n — > +oo. 


The integral with respect a process is bilinear: 
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Proposition 1.32.- Let X,Y,Z G L 2 ((a,b) , L 2 (Q, P)) be such that 

b b 


J X (t) dY ( t ) and 


X (t) dZ ( t ) there exist. Let a, P G R. Then 


a 


a 


b b b 

J X ( t ) d (( aY (i) + PZ (*))) = a J X (t) dY (t) + P J Y (t) dZ ( t ) . ■ 

a a a 

Proof.- For any t = G ^3art ((a, 6)), 

I (X, aY + /3Z, t) = al (X, Y, t) + pi (X, Z, t ) , 

so that / {X, aY + pZ) = al {X, Y) + pi (X, Z). ■ 

Proposition 1.33.- Let X,Y,Z g L 2 ((a,b) , L 2 (Q, P)) be such that 

b b 

J X (t) dZ (t) and J Y (t) dZ ( t ) exist, let a, P G M. Then 

a a 


b b b 

J (aX (t) + PY (t)) dZ (t) = a J X (i) dZ (t) + p J Y (t) dZ (t) . m 

a a a 

Proof.- For any t = G ^3art((a, b)), 

I ( aY + PZ, Z, t ) = al (X, Z, t ) + pi (Y, Z, t ) , 
so that / (al + pY, Z) = al {X, Z) + pi (Y, Z). m 


1 . 19 . 3 . Integrals with respect to a Wiener process 
Integrals having the form 


a b 

It (<p) = J <p(W (t)) dt and I w (p) = J p{W (t)) dW ( t ) 

a a 
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arc a particular case of the preceding ones, which corresponds to 

It (ip) = I (ip (W)) and I w (ip) = I (tp (W) , W) 

We have 

Proposition 1.34.- Let (W),ip(W) G L 2 ((a, b) , L 2 (Q, P)) . Then, for 
a, /3 £ R, 

I w (aip + /3V’) = otI w (<p) + filw (VO • 

In addition, 

E (I w (ip)) = 0 , 


\\Iw ( < P)IIl 2 (0, P) — IIp(^)IIl 2 ((ci, b), L 2 (Q,P)) ’ 

(Iw (<p) > Iw (V0)l 2 (Q,P) = (‘P (W) , V’ (^))l 2 ((o, b),L 2 (fi, P)) ■ ■ 

Proof.- We have 

I w (aip + /3V0 = I (aip (W) + /3ip (W ) , W) 

= al (ip (W) ,W) + pi (ip (W ) , W) = al w (ip) + (3I W (VO • 

For the rest of the equalities, the extension principle is applied: it is 
enough to show the result for ip, ip such that ip (W ) , ip (W) G 
£p (£, (a, b) , L 2 (fl, P)). In this case, 


n x — l ny — 1 

ip(W) (t,w) = J2 Eil{xi,x i+1 ) ( t ) and V’ (W) (t,u) = ^ ^i\ yi ,y i+1 ) (*) 
i = 0 i = 0 

where x = (x 0 ,...,x nx ) G ^art ((a,b)) and y = (y 0 , ..., y„J G 

/part ((a, b)). For 

n > 0 such that h = - — - < min {(5 (x) , 5 (y) } , a* = a + ih, 
n 

iPi = ip (Wi) ,ipi = ip (Wi) ,Wi = W (at) , 
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we have 

n— 1 7i—l 

ip (W) ( t , w) = 5^ ^1(04, Oi+0 (t) and $ ( W ) (t, u) = Yl V’i 1 (a i> a i+1 ) (0 

i=0 1=0 

Thus, 

71—1 71—1 

Iw W) = (Pi (W i+ 1 - Wi) and (?/>) = ^ A (W i+ 1 - Wi) , 
i = 0 i = 0 

so that 

71—1 

i = 0 

Since p i = p (W{) is independent from Wi+i — Wj : 

71—1 

E (I w (<p)) = Y E &i) E (W i+1 - m) = 0. 

; — n v 


We also have 

71—1 

% (ip) Iw tyO = Y, V&3 W+l - (^i+l - • 

i,j = 0 

For i > j, (Wi + 1 — Wi) is independent from {W ]+ \ — Wj ), p, and ^ ■ so 
that 

E (iprfj (W+i - Wi) (Wj+i - W,-)) = 

E (W i+ 1 - FF 8 ) E (ip^j (W j+ 1 - W,)) = 0- 

v ' 

= o 

Thus, 

71—1 

£ (J w (p) % ty)) = ^ £? ( PiA (W i+1 - Wi ) 2 ) = 

1 = 0 

71—1 

h Y E fa) = 1 (<F W) • 

i = 0 
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Since (W r . \ — Wf) is independent from c p i and ip t : 

n — 1 n — 1 

E (I W (<p) I W m = ]T £ Mi) E (iWi + 1 - IT*) 2 ) = h J2 E Mi) ■ 

i — 0 ^ v y i = 0 

= h 

and we have 

E(I W (tp)IwW) = l/) (W)) L 2((a,b),L*(Sl.P)) ’ 

i.e., 

( Iw (<p) , Iw (V’)) L 2(n, P) = (<P (W) , ip (W)) L 2 ((0t 6)> £ 2 (n> P)) . 

Taking ip = tp in this equality, we obtain 

ll-^W (^)|lz, 2 (n,P) = II^(^)IIl 2 ((o, b), L 2 (Q,P)) • ■ 

1.20. Ito Calculus 
1.20.1. Ito’s formula 


We observe that 



so that Ito integrals do not follow the standard calculus rules. We have 
Proposition 1.35 (Ito’s formula).- Let u € T> (M) (i.e., u € C°° (M) and its 

b 

support is compact) be such that J v! (W (t)) dW (t) and 

a 

b 

J u" (W ( t )) dt both exist. Then 

a 

b b 

J v! (W (t)) dW (t) =u{W (b)) -u{W (a)) ~ \ J u " ( w (*)) dt • ■ 

a a 
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PROOF.- Let us consider n > 0 , h = (b — a) /n and a* = a + (i — 1) h, 
1 < i < n + 1. We have 

u (W i+ 1 ) - u ( Wi ) - u' (Wi) (W + 1 - Wi) - ±u" (Wi) (W + 1 - Wi) 2 = 

where ^ is a random variable. Thus, 

u (W (6)) -u(W (a)) - T n - V n - U n = 0 , 

where 

n n 

T n = ( Wi ) (W+i - Wi) , W = u" (W) (W i+ 1 - W) 2 , 

2=1 2=1 

n 

Un = \Y. U '" (U(W i+ l-W) 3 . 

* = 1 

1) We have (by definition), 

b 

T n — > j u' (W ( t )) dW ( t ) when n — ■» +oo. 

a 

2) Let 

n 

Sn = J2 (^+1 " ^ (Oi) • 

j = 1 

Let us consider Z* = (Wi+i — W*) 2 — h. We have: 

^ = (W m - Wi) 2 (W j+1 - Wj) 2 - h (W i+1 - Wi) 2 - 
h (Wj+\ — Wj) 2 + /i 2 . 

For i / j, Wi+i — Wi and Wj + \ — Wj arc independent, so that 

E ((W i+ 1 - Wi) 2 (W j+ 1 - Wj) 2 ) = 

E ((Wi+1 - Wi) 2 )^ ((Wj+i - Wj) 2 ) = h 2 

S v A v ' 

= h —h 
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E ( ZiZj ) = h z - h z - h z + h z = 0. 

For i = j, we have 

z} = (W i+ 1 - Wif - 2 h (W . l+ 1 - Wif + /r 2 
E (Z 2 ) = M 4 (W i+ 1 - W*) - 2/iM 2 (Wj +1 - W*) + /r 2 = h 2 . 


E^S n -(b-a)) 2 )=E\^£^j 

n 

Y i E{Zj)=nh'‘ = 


E ( ( 5,, — (b~ a)) 2 1 — > 0 when n — > +oo. 


M 3 = max {|'« w (s)| : s G M} < 00 . 
Let Bi = v!" {^{Wi+x-Wif. We have 


5 2 < ^(Wi+i-Wi) 6 . 

do 


Since FF m — W* ~ N (0, v^) : 

E ((Wi+i - Wi) 6 ) = /i 3 £7^ 
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so that there exists a constant C G R such that 
E ( B 2 ) < Ch 3 . 


Or, 

n 

U n = Y J Bi , 

i = 1 

so that 

n n I 

E(Ul)= Y. E ( B ‘ B i )< Y \fEW)J E ( B i) £ 

i,j = 1 i ,j = 1 V 

n 2 Ch 3 = 

and 


C/ n — > 0 when n — > +oo. 
4) Let 


M 2 = max {| u" (-s)| : s € M} < 00 


and 


Ai = u" ( Wi ) (W i+ i - Wif - u" (Wt) {a i+ 1 - ai) = (W<) Z t . 


For i > j, Wi+\ — li'T is independent of H-t+i , hF J+ i — Wj and Wj, so 
that Z, is independent of u" (Wi), u" (Wj) and Z r Thus, for i > j. 


E ( AiAj ) = E (. Zi)E {u" (Wi) u" (Wj) Zj) = 0. 
= 0 


For i = j, we have 
E (A 2 ) < M$E (Zf) = M 2 h 2 . 
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Thus, 


E Vn-h^u" (Wi) =E =E^(A 2 )< 


r2„u2 _ n/r 2 ( b-a f 


and 


E V n — h ^2 u " (Wi) — * 0 when n — » +oo , 


so that 


V n — » j u" (W (t)) dt when n — » +oo. 

a 

5) Thus, 


Tn + -K - U n — > / u' (W (t)) dW (t) + - / u" (W (t)) dt , 


so that 


u ( W ( b )) -u(W (a)) - u'(W (t)) dkF (t) - - / u" (kh (t)) dt = 0 


for any u € V \ 


Invoking the extension principle,this result applies to any u belonging to 
the adherence and completion of D (M). 
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Ito’s formula extends to the multidimensional situation where 

u : M p — > M: 

b b 

j Xu (W (t)) dW ( t ) = u (W ( b )) - u (W (a)) - ^ J A u (W (f)) dt. 

a a 

1.20.2. /to stochastic diffusions 
We write 

dX t = a (■ t , W t ) dW* + b (: t , W t ) dt 

for the equality 

t t 

X (t) — X (0) = J a (s, W (s)) dhh (s) + J b (s, hF (s)) da. 

0 0 

and we say that the stochastic process A is a stochastic diffusion. 

Ito’s formula reads as 

du (W t ) = u' (W t ) dW t + X -v!' (W t ) dt. 

We have: 

Proposition 1.36.- Let X, Y e L 2 ((a, 6) , L 2 (Q, P)) be such that dX t = 
a x ( t , W t ) dWt + bx (t, W t ) dt and dY t = a Y (t, W t ) dW t + b Y (t, W t ) dt. Let 
a, fd € M. Then 

d(aXt + PY t ) 

= [aa x ( t , W t ) + Pay (t, W t )j dW t + [ab x (t, W t ) + pb Y ( t , W t )\ dt. m 

Proof.- The result is a consequence of the lineality of Ito integrals. ■ 

Proposition 1.37.- Let x,y : M — > M, X = x(W), Y = y (W) and 
Z = XY be regular enough. Then 


dZ t = X t dY t + Y t dX t + x' (X t ) y' (W t ) dt. m 
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PROOF.- Let z(s ) = x ( s ) y (s) = xy (s). From Ito’s formula: 
dZ t = z' (W t ) dW t + X -z" ( W t ) dt. 


or. 


z' (s) = x' (s) y (s) + x ( s ) y' (s) 


and 


z " (s) = x " (s) y ( S ) + 2x' (s) y' (s) + x{s)y"{s). 

Thus, 

dZ t = [x' (W t ) Y t + X t y< (W t )] dW t + 

| [x" (W t ) Y t + 2s' (Wi) y' (W t ) + y"(Wt)] dt 


dZ t = X t 


y' (W t )dW t + -y" (W t )dt 


Y 


dY t 

x' (W t ) dW t + X -x" (W t ) dt 


x' (W t ) y' (Wt) dt , 


dX t 


what establishes the result. ■ 

Corollary 1.15.- If dX t = a x (t,W t ) dW t + b x (t,W t )dt , dY t = 
a Y (t, W t ) dW t + by (t, W t ) dt and Z = XY, then 

dZ t = X t dY t + Y t dX t + [a Y (t, W t ) a Y (t, W t )] dt. u 

PROOF.- The result is an immediate consequence of the preceding 
proposition. 

Proposition 1.38.- Let x, y : M — > M, X = x (W), Z = y (x (W)) be 
regular enough. Then 

dZ t = y' (X t ) dX t + y " (X t ) (x' (W t )) 2 dt. m 
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PROOF.- Let z(s ) = y (x ( s )) = y ox (s). From Ito’s formula: 
dZ t = z' ( W t ) (W t ) dt. 

Or, 

^ (s) = [y (* («))]' = y' 0 (s)) a/ (s) 

and 

z " ( s ) = [y' (T («)) x> («)] ' = y" (x («)) [x 1 (s )] 2 +y' (x (s)) x" (s) . 

Thus, 

dZ t = y> (X t ) x' (W t ) dW t + * [y" (X t ) [x' (W t )\ 2 + y' ( X t ) z" (W*)] dt , 
i.e., 

dZ t = y' (X t ) x' (W t ) dW t + l -x" (W t ) dt + l -y" (X t ) [x' (W t )] 2 dt 

' v ' 

dX t 

and we have the result claimed. ■ 

Corollary 1.16.- Let Y (t) = F (X ( t )), where dXt = ax (t, Wt) dWt.+ 
bx (t,Wt)dt. Then 

dY t = F' (X t ) dX t + X -F" (X t ) (a x (t, W t )f dt . ■ 

PROOF.- It is an immediate consequence of the preceding proposition. ■ 

When F = F(t, x), Y (t) = F (t, X (t)) and dX t = A t dW t + B t dt, we 
have 

Fs rp rp i pjl rp 

dYt = (t, X t ) dt + — (t, X t ) dX t + -^ (t, X t ) (At ) 2 dt. 
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When F = F(t,x,z), Y (t) = F J g(s,X(s))dsj and 

dX t = A t dWt + B t dt, we have, by taking P (t) = 

(t,X (t), [ g (s, X (s))dsV 


rp fs rp i pjl rp fArp 

dY t = — (P t ) dt+ — (P t ) dX t+2 ^ (P t ) ( A t ) 2 dt + g (P t ) — (P t ) . 

In the multidimensional situation, where Fx = F (i, x), 

Y (t) = F (t, X (t)) , F : M x M n — > M m , we have, for 1 < i < m, 

3F 1 

d (^)t = -qP (t, X t ) dt + (A dX t ) i + - (dX t y B idXt , 


where 

9F dF 

Aij (t,x) = 

It is usual to write these equalities by using the Einstein convention of sum 
about repeated indexes: 


dF.- 1 

d C Yi\ = -gp (t, X t ) dt + Ajjd (Xj) t + -B ijk d (Xj) t d (X k \ . 

The simulation of stochastic diffusions or stochastic differential equations 
request a discretization in time. Analogously to the Wiener processes, the 
reader may find in the literature many works on this topic. Here, we illustrate 
the simulation of Ito’s diffusion dXt = a(t, Wt)dWt. + bit. Wt)dt, Xo given, 
by using a simple Euler discretization: 

Listing 1.22. Simple simulation ofdXt = a(t, Wt)dWt + b(t , Wt)dt 

function X = ito_diffusion (ndimw , nstep ,timestep ,a,b, XO) 

% 

% generates nstep steps of a the process 
% dX_t = a(t , W_t ) dW_t + b( t ,W_t) dt , X_0 = XO 

% 

% IN: 

% ndimw: dimension of the Wiener process 
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% nstep : number of steps — type integer 
% timestep: the time step — type double 
% a: the first coefficient — type anonymous function 
% b: the second coefficient — type anonymous function 
% XO: the initial value — type double 
% 

% OUT: 

% X : table — type double 

% X( i , : ) is the value of X( ( i — 1 )* time st ep ) 

% 

ndim = length (XO); 

X = zeros (ndim, nstep + 1); 
sqrdt = sqrt ( timestep ) ; 
w = zeros ( ndirnw , 1 ) ; 

X(: ,1) = XO; 

Z = X; 
t = 0; 

for i = 1 : nstep 

dw = sqrdt *randn ( ndim , 1 ) ; 
at = a (w, t ) ; 
bt = b (w, t ) ; 

X( : , i + 1) = X( : , i ) + at *dw + bt*timestep; 
t = t + timestep; 
w = w + dw ; 

end ; 
return ; 
end 



imulation of dX, = a(t, X,)dW, + b(t^)dt 



Figure 1.7. Simulation results 


Analogously, the simulation of the stochastic differential equation dX t = 
a(t, X t )dWt + b(t,Xt)dt, Xq given, is implemented by a simple Euler 
discretization: 
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Listing 1.23. Simple simulation ofdXt = aft, Xt)dWt + bft, Xt)dt 
function X = ito_sde( ndimw ,nstep .timestep ,a,b, XO) 

% 

% generates nstep steps of a the process 
% dX_t = aft, X_t ) dX_t + b(t ,X_t)dt , X_0 = XO 
% 

% IN: 

% ndimw: dimension of the Wiener process 
% nstep: number of steps — type integer 
% timestep: the time step — type double 
% a: the first coefficient — type anonymous function 
% b: the second coefficient — type anonymous function 
% XO: the initial value — type double 
% 

% OUT: 

% X : table — type double 

% X( i , : ) is the value of X( ( i — 1 ) *ti me s t ep ) 

% 

ndim = length (XO); 

X = zeros (ndim, nstep + 1); 
sqrdt = sqrt ( timestep ) ; 
w = zeros (ndimw , 1 ) ; 

X(: ,1) = XO; 

Z = X; 
t = 0; 

for i = 1 : nstep 

dw = sqrdt *randn (ndimw , 1 ) ; 
at = a (X( : , i ) , t ) ; 
bt = b(X(: , i) , t ) ; 

X(:,i + 1) =X(:,i) + at*dw + bt*timestep; 
t = t + timestep; 
w = w + dw ; 

end ; 
return ; 
end 
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2.1. Construction of a stochastic model 

The construction of stochastic simulations of a system is generally 
organized as follows: 

1) First, we construct a deterministic model for the system. 

2) In the second step, randomness is introduced by transforming the 
deterministic model into a parametric stochastic model, some parameters 
of the system arc selected in order to be considered as random and their 
probabilistic models are generated, i.e. their respective probability density 
functions arc determined. 

3) A numerical method of stochastic simulation - most frequently the 
Monte Carlo method - is applied in order to generate information and facilitate 
statistical inferences about the system response. 

The steps of the construction of deterministic and stochastic models of the 
system arc essential in order to obtain realistic results in stochastic 
simulations. These two models arc used in the stochastic simulation (for 
instance, by the Monte Carlo method) and directly influence the responses. 
Examples of construction of stochastic models can be found in 
[RIT 09, RIT 08, RIT 10b, RIT 10a, RIT 12]. 

The Monte Carlo method typically generates samples and uses them in 
order to generate information: on the one hand, samples of the parameters 
selected to be random arc generated by using the pre-assigned probability 
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distributions defined by the stochastic model and, on the other hand, each 
sample generates a variate from the system response by using the 
deterministic model. These deterministic results arc collected and used for the 
generation of the targeted quantities. For instance, they may be aggregated in 
order to produce statistics and approximations of the probability distribution 
of the system response. 





Monte Carlo Method 


Process the 

samples 



Observations 


Deterministic 

model 


Figure 2.1. Monte Carlo method 


Statistics 


Thus, the choice of the probability densities of the parameters selected to 
be random is a critical point, which requires some precaution. 

These densities may be determined by various ways. For instance, 
experimental data may be used to construct histograms that represent the 
distribution of the parameters under analysis. An alternative way consists of 
determining a representation of these random parameters as functions of 
known random variables, by using the methods previously presented. 

However, in many practical situations, performing experiments may be 
costly or impossible. In these situations, other arguments must be used in 
order to determine the probability density functions under analysis. In 
general, theoretical, physical or mathematical arguments may be used. 
Nevertheless, there is a systematic and popular procedure leading to the 
construction of such probability densities: the principle of maximum entropy 
(PME). In the following, we examine how this procedure may be applied and 
how it works in some simple situations. 
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2.2. The principle of maximum entropy 

The PME is an efficient method for the generation of probabilistic models 
of scalar or vector parameters, and may be used for continuous or discrete 
random variables. 

PME is a tool for the determination of probability density functions 
constrained by the respect of the statistical information available on the 
variables, such as, for instance, their support, their mean, their covariance 
matrix etc - eventually, several informations may be taken into account 
simultaneously: the result directly depends on the available information and 
changes by adding or subtracting information. 

As its name indicates, the PME looks for a solution that maximizes the 
entropy subject to constraints defined by the available information. It reads as: 

Among all probability distributions which satisfy the restrictions given by 
the available information, select the one that maximizes the entropy. 

Naively, such an approach uses entropy as a measure of uncertainty and 
it may be interpreted as looking for the distribution that corresponds to the 
maximal uncertainty and the minimal quantity of information, i.e. assuming 
that the available information is minimal. 

The idea of information directly leads to information theory and measures 
of quantity of information: when using PME, the popular approach consists 
of the use of Shannon ’s entropy (see [SHA 48]). The idea of maximizing the 
entropy was proposed by Jaynes (see [JAY 57a, JAY 57b]) in the context of 
statistical mechanics - field where the notion of entropy is close to Shannon’s 
entropy. 

In the following, we will illustrate the use of the PME by determining the 
distributions of discrete and continuous variables. We first consider discrete 
random variables taking a finite set of values, and then discrete variables 
taking an infinite set of values. Subsequently, we will present examples with 
continuous random variables and, finally, continuous random vectors. 



136 Uncertainty Quantification and Stochastic Modeling with Matlab® 


2.2.1. Discrete random variables 

Let us consider a discrete random variable X, defined on a finite set 17 = 
{co\, ■ ■ ■ Uk} and, thus, taking a finite number values x = {xi, • • • , x n } 
(n < k ). 

Let us apply the PME in order to determine its distribution: in our first 
approach, we look for the values p l = P [X = Xj) , i = 1, • • • , n. These 
unknowns are grouped in a vector p = (pi, ■■■ , p n )• I n this case, the 
Shannon’s entropy is given by: 


S (p) = — ^2 Pi h 1 (pi) ( hy convention 0 In 0 = 0) . [2.1] 

i = t 

Let us consider first the situation where no additional information is 
available: in this case, the only restrictions to be taken into account consist of 
imposing that the values of p define a probability, (i.e. these arc non-negative 
values having their sum equal to one. Let C be the admissible set (i.e. the set 
of the values of which satisfy the restrictions): in this situation, we have 

C = {p : pi H \- p n = 1 Pi > 0 Vi G {1, • • • , n}} . [2.2] 


The PME states that 

(p = arg max {S (q) : q € C}. [2.3] 

Equation [2.3] defines a constrained optimization problem, which may be 
solved by introducing the Lagrange multipliers associated with C : let us 
consider A = (Aq, • • • , A n ) and the Lagrangian 


L(p, A) = S(p) — A 0 



n 

+ ^2 ^iPi 

i = 1 


= 'S'(p) - ^2 ( A 0 + X i)Pi 
1=1 


[2.4] 
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With these notations, Ao is the Lagrange multiplier associated with the 
condition that the sum of the terms is equal to one and A *, with i > 0, is the 
Lagrange multiplier associated with the condition p l > 0. From the general 
properties of Lagrange multipliers, we have 

\i > 0 ; Ai = 0, if pi >0 (i > 0). [2.5] 

Let us denote 


B = {i € { 1, • • • , n} : pi > 0} ; N = {i € {1, ■ ■ ■ , n} : pi > 0}. [2.6] 

where B is the set of indexes such that p t > 0 and let us denote by N the set 
of the indexes for which p L = 0. Let rip he the number of elements of B. We 
have 


= 1H + ^Pi = ^ZPi = l - [ 2 - 7 l 

ieB i&B i£N i = 1 

so that B / | - otherwise, the sum of the elements of p cannot be equal to 
one - and rip > 0. Since 

i G B = A,; = 0 , 

we have: 

dL 1 

— (p, A) = - In ( Pi ) - pi- - Ao = 0 

° Vl Pl [2.8] 

= — In (p ,, ) — Ao = 1 i G B 

so that: 

In (pi) = 1 + A 0 =*► Pi = e _Ao_1 i G B. [2.9] 

Thus, 

pt = p, Vi G B ; p = e~ Ao_1 . 
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Since their sum is equal to one, we have: 

1 = ^pi = = n B p- 

ies i&B 

Thus, 

Pi = — i€B. [2.10] 

n B 

So, if no information is available about the random variable X, the PME 
provides a set of solutions formed by all the uniform distributions defined on 
non-empty parts of x. If we add the supplementary information that all the 
values have a strictly positive probability, then the PME provides a single 
solution, which is the uniform distribution on the whole x. 

In our second approach, we look for the probabilities of the elements of f l 
and our unknowns are p t = P(ui). Analogously to the preceding situation, 
the PME provides a set of solutions formed by all the uniform distributions 
defined on non-empty parts of Q. If we add the supplementary information 
that all the values have a strictly positive probability, then the PME provides 
a single solution, which is the uniform distribution on the whole fi. In this 
case, the distribution of X may not correspond to an uniform distribution. For 
instance, in the last situation, where all the elements have a strictly positive 
probability, P(X = xfj = card(X~ l ({xi\))/k, where card(S) denotes the 
number of elements of 5: if these subsets do not contain the same number of 
elements, X is not uniformly distributed. 

This simple example shows that the results provided by PME are closely 
connected to the information used, namely the support of the distribution. 

It is interesting to notice that, if x (or Q) is a infinite countable set, then 
the result is the set of all the uniform distributions on finite non-empty parts 
of x (or fl). In this case, assuming that all the elements have a strictly positive 
probability leads to a contradiction, since the condition 1 = Ylie B P cann °t 
be satisfied if p contains infinitely many elements. However, additional 
information about X may keep a solution possible (see example 2.2). 

In the following, we limit ourselves to the situation where all the elements 
have a strictly positive probability. Thus, we do not consider Lagrange 
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multipliers associated with the non-negativity condition of the probabilities: 
they are assumed to be null in the following. 

Let us now consider a situation where more information is prescribed. Let 
us suppose that we have m informations of the form 


^PiOr i(x) = a r 


r = 1, 


[ 2 . 11 ] 


In this case, the admissible set is (recall that we assume that all the pi are 
strictly positive). 


<Hp:£ Pi — 1 ? ^ ^ Pi9r i (-X-) — tr y , r — 1, 5 rn / . [2.12] 


As in the preceding situation, let us introduce the Lagrange multipliers 
A = (Ao, • • • , A m ), where Ao is the Lagrange multiplier associated with the 
same condition as previously (the sum of the components of p is equal to 
one) and A*, with i > 0, is the Lagrange multiplier associated with the 
condition ]E” =1 p,g r , (x) = a r . Then, the Lagrangian reads as 


L(p,A) = S(p)-A 0 E Pi - 1 - EME Pi9r i I [2-13] 


r=l \i = 1 


and the PME yields that : 


dL 

-7T- =° 
tipi 


In (pi) ~ A 0 - A r g r j = 1 


[2.14] 


so that: 


m 

- 1 - A r g r j) i = !,■■■ ,n. 

r = 1 


Pi = exp (— A 0 


[2.15] 
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The multipliers are determined by replacing p, [2.15] in [2.2] and [2.11]: 
we have 


exp -A 0 - 1 - J2 \ j9j i = 1 

*= 1 V 3 = 1 / 


[2.16] 


and 


y g r i exp I -Ao - 1 - \j 9 ji 1 = a r r = l,---,m. [2.17] 

i= 1 \ 3 = 1 


It is interesting to note that these equations extend straightly to the situation 
where X takes infinitely many countable values by replacing n by + oo in 
equations [2. 15]— [2. 17], Example 2.2 illustrates such a situation. 

In the examples below, we illustrate the use of these equations in some 
particular situations. 

Example 2.1.- Let us consider a discrete random variable X taking a finite 
number n of values x = {x \ , • • • , x n ). Let us assume that the mean of X is 
known and has the value p. In this case, the admissible set is (recall that we 
assume that all the probabilities arc strictly positive) 


c = \ p : y Pi = i ; y 


Xi pi = p 


[2.18] 


Applying the PME, we look for a solution p € C such that 5(p) < S'(q), 
Vq € C. This situation corresponds to 

m = 1 ; pi ,;(x) = Xi ; ai = p. 

Thus, we have two Lagrange multipliers associated with C: X = (Ao, Ai). 
The Lagrangian is 


L( P, A) 


n 

y Pi in (pi) - A 0 

i= 1 



Ai y Xi pi 


[2.19] 
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and we have 

8L 

— = 0 =>- - In (pi) - A 0 - Ai Xi = 1. [2.20] 

opi 

Equation [2.15] shows that 

Pi = exp (— Ao — Ai Xi — 1) i = 1, • • • , n [2.21] 

The values of the Lagrange multipliers arc determined by using equations 
[2. 16]— [2. 17] : 

Tl n 

exp (— Ao — Ai Xi — 1) = 1 e x t exp (— Ao — Ai Xi — 1) = p. 

i= 1 i= t 

[2.22] 

These equations may be solved by standard methods for nonlinear 
algebraic systems, such as Newton-Raphson. ■ 

Example 2.2.- Let X be a discrete random variable taking infinitely many 
values given by the natural numbers: x = {0,l,2,---}. Let us assume that the 
mean of X is known and has the value p > 0 - since we assume that all the 
Pi arc strictly positive, the mean of such a variable must be strictly positive: 
this condition about the sign of p appeal's as a condition for the existence of a 
solution in the developments below. Here, the admissible set is (recall that we 
still assume that all the probabilities are strictly positive) 


C 


+oo +oo 

P ; = 1 ; ^XiPi = p 


[2.23] 


In this situation, we have, analogous to example 2.1, 

m = 1 ; pii(x) = Xi -,ai= p 

and we also have two Lagrange multipliers associated with C: A = (Ao, Ai). 
The Lagrangian is analogous to the one of example 2.1: 

+0O / +oo \ / +0O \ 

L(p, A) = — ^ pi 111 ipi) - Ao I ^2 Pi - 1 ) - Ai I ^2 1 Pi - M | • t 2 ' 24 ] 

i=0 \i=0 / \i=0 / 
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We have 


dL_ 

dpi 


- In (pi) - A 0 - Ai i = 1. 


[2.25] 


which leads to equations analogous to equations [2. 15]— [2. 17], with n replaced 

by + oo. 

Since xt = i, we have, from the analog of equation [2.15]: 

Pi = exp ( — Aq — 1 — Ai i) i = 0, 1, 2, • • • . [2.26] 


Let us introduce 

a = exp ( — Ao — 1) ; /5 = exp ( — Ai). 

We have a > 0 and /3 > 0. In addition, p, reads as 

Pi = a /3\ 

By using the analogous of equation [2.16], we have 


+oo +oo 

i = Pi = a Y^- 

i=0 i= 0 

This equality implies that (3 < 1: if f3 = 1, then P 1, = + oo and, 
since a > 0, the equality leads to 1 = + oo, which is a contradiction. Thus, 
0 < f3 < 1 and we have 


+oo +oo 

i = Y pi = 


i=0 2=0 


a 


By using the analogous of equation [2.17], we have: 


[2.27] 


oo -boo 

m = Y ipi = a X/ p 1 - 

2=0 2=0 
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We observe that this equality cannot be satisfied if p < 0. Since 0 < ft < 1: 

+00 H-OO 7 +00 


d 




ft 


2=0 2=0 

so that (using equation [2.27]) 

aft ft 


d P (1 - b eta) 2 


l- L = 


Thus, 


ft = 


(1 — ft ) 2 l - ft ' 


1 + /i 


Ai = — In 


1 + /i 


and 


a = 1 — 


1 + /T 1 + /i 


Aq = — 1 — In 


1 + /x 


So, 


[2.28] 


[2.29] 


[2.30] 


Pi = 


1 + Pj \1 + M 


i = 0, 1, 2, - - - . i 


[2.31] 


2.2.2. Continuous random variables 

The PME is more frequently used for the determination of the distribution 
of continuous random variables. In the case of a continuous variable X having 
a density of probability p : M — > M, Shannon’s entropy is given by 

S(p) = — p(x) In p(x) dx ( by convention OlnO = 0) . [2.32] 

Jr 

As previously observed, the support of the variable is an essential 
information: for a continuous variable, we may consider three situations: a 
finite interval (a, b ), a semi-infinite interval - for instance, [a, oo), or the set 
of all the real numbers (— oo, oo). Since we adopt the convention OlnO = 0, 
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the integral in equation [2.32] reduces to an integral on the support. For 
instance, if the support is (a, b ), we have: 

S(p) = — f p(x)lnp(x) dx. m [2.33] 

J a 

Analogously to the discrete situation, let us examine the case of a variable 
X taking its values on a finite interval (a, 6), for which no other information is 
available. In this case, the admissible set is : 


C = {p : (a, b) — ■» M : f p(x) dx = 1 ; p{x) > 0 on (a, b)}. [2.34] 


In this situation, we have two Lagrange multipliers A = (Ao, Ai (x)) such 
that: 


L(p , A) = — / p(x ) In p(x) dx — Aq 


p(x) dx — 1 


— / Xi(x)p(x) dx. 


Thus: 


L(p, A) = — f h(p, A) dx, 
J a 


[2.35] 


where: 

h(p, A) = p(x)[lnp(x) + A 0 + Ai(x))]. [2.36] 

As in the discrete situation, we have 

Vx € (a, b) : Ai(x) > 0 ; Ai(x) = 0, if p(x) > 0. 

Let us assume that p(x) > 0 on (a, (3) C (a, b). From variational calculus, 
we have 


d 

dp(x) 


h(p, A) = 0 on (a, j3). 


[2.37] 



Maximum Entropy and Information 145 


Thus: 

p{x) = po ; Po = exp (— A 0 - 1) on (a, b). [2.38] 

So, pix) is constant on its support. Let us denote by A = supp(p ) = { x € 
(a, b ) : p{x) > 0}. We have 

p(x) dx = / po dx = po n ieas(A). 

J A 

so that 

p(x) = — — — — if x € A ; p{x) = 0 4 A. 

meas(A) 

So, analogously to the discrete situation, the PME furnishes as result the 
family of all the uniformly distributed variables having their support on A C 
(a, b). By adding the supplementary information that A = (a, b), we obtain a 
unique solution 

P(x) = 1 [ a ,b\Po ; ^ a - [2-39] 

We observe that: 

>0, [2.40] 

op(x) z 

so that this extremal point corresponds to a maximum of S. Analogous to the 
discrete situation, we may also consider 12 = (uj a ,<Xb) and look for a 
probability on fk in this case, the solution will be the family of uniform 
distributions on subsets of 12 and analogously to the discrete situation, the 
distribution of X may result different from a uniform distribution. Finally, we 
observe that, as in the discrete case, considering unbounded sets, such as 
(a, +oo) or M, results in the set of all the uniform distributions on bounded 
non-empty parts of the original set. Here yet, additional information about X 
may keep a solution possible (see examples 2.4, 2.5 and 2.6). 

We observe again that the results furnished by PME are closely connected 
to the information used, namely the support of the distribution. 
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In the following, we limit ourselves to the situation where p(x) > 0 on the 
set under consideration. Thus, we do not consider Lagrange multipliers 
associated with the non-negativity condition of the probabilities: they are 
assumed to be null in the following. 

It is common to consider situations involving information on statistics of 
the distribution to be determined, such as its mean or some moments: assume 
that the random variable X satisfies m restrictions written in the form: 


p(x)g r (x) dx = a r 


r = 1 , • • • , rn. 


In this situation, the admissible set is 


C = {p : (a, b ) — » M : f p(x) dx = 1 ; 

J a 

r b 

/ p(x)g r {x) dx = a r , r = 1, • • • , m}. 

J a 

and we consider m + 1 Lagrange multipliers A = (Ao, Ai, 
Lagrangian becomes: 

L(p, A) = S(p) - A 0 (faP(x) dx - l) 

m / rb \ 

— ^2 X r I / p(x)g r (x) dx — a r ) 

Equation [2.41] assumes the form: 

f b 

L(p, A) = / h(p, A) dx, 


■ , A m ) and the 


[2.41] 


[2.42] 


with: 


h(p, A) = p(x)[lnp(x) + A 0 + ^2 Kg r (x)\. 


[2.43] 
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From variational calculus: 

d 

h(p , A) = 0. 


dp(x 

and we have: 


[2.44] 


p( x ) = 1 [a, 6 ] 0*0 exp (-Ao - 1 - ^2 X r9r(x)). 


[2.45] 


The Lagrange multipliers are determined by using equation [2.45] in [2.41]: 
we have 


exp (-A 0 - 1 


m 

A r g r (x)) dx = 1 

r= 1 


and 


[2.46] 


,'b m 

/ g r (x) exp (— Ao — 1 — A r g r (x)) dx = a r , r = 1, • • • , m. [2.47] 
J a i 


Example 2.3.- Let us consider a continuous random variable X having as 
support (a, b), such that E[X] = /j and £?[X 2 ] = // 2 . In this case, we have 
m = 2, pi(x) = x, <720*0 = x 2 , ci\ = p, a ,2 = g 2 - Thus, we consider 
A = (Aq, Ai, A 2 ). From equation [2.45]: 


p(x) = 1 [a,b]( x ) exp (— A 0 - 1 - Aix - X 2 x 2 ). 
It is convenient to rewrite this equality as 

p(x) = a exp (— (3(x — t) 2 ), 


where 


A 2 

a = exp (-A 0 - 1 + ^), /3 = A 2 , 7 


Ai 

2A 2 


[2.48] 


[2.49] 
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The values of the multipliers are determined by solving equations [2.46]- 
[2.47] for the unknowns a, /3, 7 . These equations read as 

fb 

a / exp (— (3(x — 7 ) 2 ) dx = 1 
J a 

rb 

a / xexp (— f3{x — 7 ) 2 ) dx = p. [2.50] 

J a 

f b 9 

a x 2 exp {—(3(x — 7 ) 2 ) dx = ?? 2 

J a 

Equation [2.50] may be solved by methods adapted to the solution of 
nonlinear algebraic equations, such as Newton-Raphson. ■ 

Example 2.4.- Let us consider a continuous random variable X supported 
by (0, 00 ) and having a known mean E[X] = // > 0. In this case, we have 
m = 1 , gi(x) = x, a\ = //. 

From equation [2.45]: 


p(x) = exp (— Aq — 1 — Aix). 


[2.51] 


Let us denote a = exp (— Ao — 1). Then p(x) = a exp (— Aix) and we 
have, from equation [2.46] : 

a 

1 = a exp (— Aix) dx = — , 

Jo Ai 

so that a = Ai (note that Ai > 0: otherwise, the equality is impossible). Thus, 


H = Ai 



xexp (— Aix) dx 



A 


1 


so that 


p(x) 


1 [0,oo) (*^) 



[2.52] 
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Example 2.5.- Let us consider a continuous random variable X supported 
by (0, oo) for which the following additional information is available: E\X\ = 
H > 0 and E/[ln(X)] = a. This situation corresponds to m = 2, g\{x) = x, 
g 2 {x) = In ( X ), a\ = p, a 2 = g. 

From equation [2.45]: 


p(x) = exp (— Aq — 1 — Aix — A 2 In (x)). 


[2.53] 


Let us denote a = exp (— Ao — 1). Then p(x) = ax Aa exp (— Aix) . This 
expression may be compared with the classical T distribution: 


P(x) 


x a 1 exp {— |) 
d a T(a) 


Thus, the solution of equations [2.46]-[2.47] leads to a Gamma 
distribution : 


1 / 1 \ 52 

p{x) = l] 0 ,+ oo ](*)- 72 


> \6 2 J T(l/5 2 ) VM 

with (<r = ...): 

roc 

6 = — ; T(a) = / f a_1 exp (—t)dt. 

i J Jo 


exp 79— , [2.54] 
<rp 


T(«) is referred to as Gamma function. ■ 

Example 2.6.- Let us consider a continuous random variable X having as 
support (— 00 , 00 ) for which two moments arc given: E\X] = // and E\X 2 \ = 
rj 2 . The situation is similar to the situation considered in example 2.3: m = 2, 
g\ (x) = x, g 2 (x) = x 2 , a\ = p, a 2 = g 2 (only the support is different). As in 
example 2.5, we have: 


p(x) = a exp {—/3(x — 7 ) 2 ), 


[2.55] 
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where a, fd and 7 are the solutions of equations [2.46]— [2.47], which read as 


a 



a 



exp (—/3(x 


7 ) 2 ) dx = 1 
7 ) 2 ) dx = /I. 
7 ) 2 ) dx = T] 2 


[2.56] 


Since (see [KAP 92]): 


exp (—fd{x — 7 ) 2 ) dx = for /5 > 0, 


[2.57] 


we have: 


1 


s/2tto 

with a 2 = rj 2 — /i 2 . Thus, 


'■ 2cr 2 ’ 


c = p, 


p(x) = 


1 


\/2ttc 


exp 


1 (x - nf 


2 d 2 


[2.58] 


[2.59] 


So, the PME furnishes as solution a Gaussian density. ■ 

In the following, we summarize some classical results derived from the 
PME. 


Continuous variables with bounded support (a, h): 

We recall below some classical results furnished by the PME under the 
assumption that X is a continuous random variable having as support (a, b ): 

1) If no additional information is given, the PME furnishes as solution the 
uniform distribution on (a, b)\ 

P(x) = l {a,b]( X ) T—- [2 - 60] 

1 J b — a 
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2) If the mean of 26 is given: E[X] = //, then the solution is a truncated 
exponential distribution : 


p(x ) = l[ a ,b]{x) exp(-A 0 - Xix), 


where: 


rb rb 

S -A 0 / e -Ai* dx = i e e" A ° / x e" Aia; dx = //. 
J a J a 


[2.61] 


[2.62] 


3) If a = 0 , 6 = 1, and we have if [In (26)] = Aq and £7 [In (1 — X)] = k 2 , 
then the PME furnishes a beta distribution : 


P{x) = 1[0,1](») 


1 


B(m, n 
where B is the beta function: 

TmTn 


(m-i) 


(1 -x)' 


B(m . , n) = 


fm + n 


[2.63] 


[2.64] 


The values of m 


POO 

and T(m) = / t" t_1 exp(—t)dt is the Gamma function. 

Jo 

and n arc determined from the equations E [In (26)] = k\ and if [In (1 — 26)] = 
k 2 : 

- [ x (m_1) (l - x) n_1 In (x) dx = E[ In (X)] = k u [2.65] 
Jo 


B(m , ' 


1 

— / x (m_1) (l - x) n ~ l In (1 - x) dx = Min (1 - X)] = k 2 . [2.66] 

f3{m,n) J o 


Continuous variables with support (0, oo); 

Here, we collect some classical results furnished by the PME under the 
assumption that 26 is a continuous random variable having as support (0, oo): 

1) If no additional information about 26 is given, the PME does not admit 
a solution. 

2) If the mean of 26 is known, E[X] = p, the solution is the exponential 
distribution: 

P(x) = 1 [ 0 , 00 ) 0*0 ~eX. 

fl 


[2.67] 
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3) If the available information consists of the mean of X and the mean of 
In (X); E[X] = ji and /f [In (X)] = q, the solution is the Gamma distribution'. 


p(x ) 


1 [0,+oo) (•t') 


1 

P 




[ 2 . 68 ] 


where V is the Gamma function ( T(o) = / t a 1 exp(— t)dt) and 5 = 

Jo M 

4) If E[ In (X)] = k\ e £J[ln (1 — X)] = k 2 , the PME provides as result: 
p(x) = 1 ]0 +oo) (x) — — 1 - — rx^^il + x)~ {n+m) , [2.69] 


where B is the beta function [2.64] and the values of m and n arc determined 
from the equations E [ In (X)] = k\ and Xfln (1 — X)] = k^- Thus: 


1 

/*oo 

/ a;9” -: 0(l _|_ x)-( n+m ) In (x) dx = Xfln (X)] = k\, 

Jo 

[2.70] 

B(m, n) 

1 

/»oo 

/ x^-'Xl+xy^hnil-x) dx = E[ ln(l-X)] 
Jo 

= k 2 . 

f(m, n ) 



[2.71] 


Continuous variables having by support (— 00 , 00 ); 

Here, we recall some classical results furnished by the PME under the 
assumption that X is a continuous random variable having as support 

(— 00 , 00 ): 

1) If no additional information about X is given, the PME does not admit 
a solution. 

2) If the only information is the mean of X, the PME does not admit a 
solution. 



Maximum Entropy and Information 153 


3) If E[X\ = n and E\X 2 ] = a 2 , the PME furnishes a Gaussian density: 
1 


p(x) = 


s/2ttc 


exp 


1 (x — p) 21 

2 a 2 


[2.72] 


2 . 2 . 3 . Random vectors 

In this section, we consider a random vector X = (X \ , , X n ) of 

dimension n formed by n random variables X r , i = 1, . . . , n. The cumulative 
density function of such a vector is: 


P : R n 


x 


[ 0 , 1 ] 

P(x 1 < XI, X 2 < X 2 , ■ ■ ■ , x n < X n ) 


[2.73] 


where x € R n . In the following, we denote r/X = dx \ ■ ■ ■ dx n . Its density of 
probability p is given by: 

p(x) = — ^ — P(X 1 < X'l, X 2 < x 2 , ■ ■ ■ , x n < x n ), [2.74] 

OX 1 OX 2 * * * OX n 

sometimes referred to as the joint distribution of the vector 

X= (X 1} ••• , X n ) . 

The Shannon’s entropy of a random vector X is: 

S(p) = — I p(x) lnp(x) dx. [2.75] 

J R™ 

Analogously to the preceding situations, p verities: 

p(x) >0, / p(x) dx = 1. [2.76] 

Jr™ 


Analogously to the preceding situations, we denote by Ao the Lagrange 
multiplier associated with the condition f Rn p(x) dx = 1 and we limit the 
following to the situation where the support of X is given and the Lagrange 
multiplier associated with the condition p(x) > 0 is identically null. In 
addition, we consider that this support is the whole space M n . 
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Let us consider the situation where m supplementary informations on X 
arc given under the form: 


/ p(x)g r (x) dx = a r 

JR" 


r = 1, • • • , m. 


[2.77] 


with v i, • • • , v m being all strictly positive integers. 

Let C be the space of the maps X i — > p(X) from M n onto M > 0, having 
all the same support k n C M n (eventually, k n = M n ) and satisfying equations 
[2.76]— [2.77] : 

C = {p : M n — » M-° : supp(p ) = k n ; p satisfies [2.76]— [2.77]}. 


The PME reads as 

p = arg max {S(q) : q G C}. [2.78] 

As previously, we introduce the Lagrange multipliers A = (Ao, • • • , A m ), 
where Ao is the Lagrange multiplier associated with equation [2.76] and A*, 
with i > 0, is the Lagrange multiplier associated with equation [2.77] with 
r = i. Then, the Lagrangian reads as 


L(p,X) = S(p)-X 0 


p(x) dx — 1 


r — 1 


/ p(X)g r (x) dx — a, r 

JR" 


> 

Ri'r 


[2.79] 


where < u, v >M yr = u\V\ + • • • + u Vr v Vr denotes the euclidean scalar 
product in M 1 ' 7 ' . 

Equation [2.79] may be written as: 


L(P, A) 


h(p(x), A) dx. 


[2.80] 
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with: 


h(p, A) = p(x) 


lnp(x) + A 0 + ^ < A r , g r (x) >j 


From variational calculus: 

^ h(p, A) = 0 , 


dp(x 
which gives: 

( m 

-Ao K , g r(x) > R „ r 

r=l 

So, analogously to the preceding situations, we have: 
f exp f — A 0 A r , g r (x) dx = 1. 


[2.81] 


[2.82] 


[2.83] 


[2.84] 


g,.(x) exp 


-Ao - ^2 < A r , g r (x) >j 


dx = 




Example 2.7.- Let us consider a random vector X having dimension n and 
supported by k n = [ai, 61 ] x • • • x [a n , b n }. If no other information is available, 
equation [ 2.83] shows that 


p(x) = p 0 ; p 0 = l kn exp (-A 0 ) , 


[2.86] 


so that p(x) is constant. Using [2.84]: 



dX = 1 


[2.87] 


and we have 


Po 


1 

meas(k n ) 
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Thus, the solution is the uniform distribution: 


P(x) 


!fcn( X ) 


1 


meas(k n 


[2.88] 


Example 2.8.- Let us consider X having dimension n, support 
k n = (0, +oo) x • • • x (0, +oo) C M n and a known mean -E[X] = /i x € k n . 
This situation corresponds to m = 1, v\ = n, ai = and g. r (x) = x. The 
solution furnished by the PME is 


p(x) = p Xl (xi) X • • • X px n (x n ), [2.89] 

where p Xj is an exponential density [2.52]: 

PXjixj) = l(o,oo ){xj) — , j = !,-■■ ,n. [2.90] 

Px j 

In this case, the PME furnishes as solution independent random variables. 


Example 2.9.- Let us consider X of dimension n, with support k n = M n , 
known mean E[X] = fi x g k n , and a given covariance matrix C (a real 
symmetric positive definite matrix such that Cov{Xj , Xj) = Cij). In this 
situation, the PME furnishes as solution the Gaussian distribution: 


v / (2rr) n det (C [2.91] 

• ex p{-5< c_1 ( x -/ix) . (x-/Lx)) K n}-" 


2.2.4. Random matrices 

In the previous sections, we have used the PME for determining the 
distributions of random variables and vectors in situations where some 
information about these variables is given. We observe that the method 
extends straightly to the determination of the distribution of random matrices 
(see [SOI 00]). 
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Let us denote by: 

- M n = A4 (n. n ): the set of all the square real matrices n x n; 

- = A4 s (n, n ): the set of all the real symmetric matrices n x n; 

-M+° = M +0 (n,n ) : the set of all the real symmetric matrices n x n 
which are, in addition, positive semi-definite; 

- M+ = Ai + (n, n ): the set of all the real symmetric matrices nx n which 
are, in addition, positive definite; 

- M£ = M d (n, n ): the set of all the real diagonal matrices nx n. 

We have 

Mf(c M+ C M+° cMfcM n 
The norm of a matrix is given by: 

||yl|| = sup ||[^4]x|| , x € M n 

||tr||<l 

and its Frobenius norm (or Hilbert-Schmidt norm) is: 

n n 

M||^ = lr{r4] T [A]}=££[.4L [2.94] 

3 = 1 k= 1 

We have \\A\\ < \\A\\ F < y/n\\A\\. 

Let us introduce the element of volume dA n given by: 
dAn = 2 n ( n-1 )/ 4 H dA ni j. [2.95] 


[2.92] 

[2.93] 


Let A n be a random matrix taking its values on M+, which has as density 
of probability (•), such that 

A n i — > p( A n ), from M+(M) to M + = [0, +oo) [2.96] 

Analogously to the preceding situations, the PME may be used in order 
to generate the distribution p. For instance, let us suppose that the following 
information is known about A n : 
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1) A n G M+ and / p An (A n ) dA n = 1; 

J M+ 

2) the mean of A n is given: 

E{ A„)= f A n p(A„) dA n = A n . [2.97] 

J M+ 

3) / In (det, A n )p(A n ) dA n = v, with |n| < +oo. 

Ju+ 

Analogously to the preceding situations. Shannon’s entropy reads as: 

S(p) = - [ p(A n ) lnp(A„) dA„ [2.98] 

J M+ 

and the admissible set C is formed by all the densities of probability p from 
M+ on M + verifying all these conditions. 

In order to apply the PME, some algebraic manipulations are requested. Let 
us decompose A n from the Cholesky decomposition: 

A n = L An T L An , [2.99] 

where L An is an upper triangular matrix. Then, the PME furnishes: 

A n = L An T G n L An , [2.100] 

where G n is a random matrix taking its values on M+ having mean equal to 
the identity matrix Id n : 


G n = E( G n ) = Id n . 

The probability density of G n is denoted by q(»). We have: 

(n+ i)Oz£) 

9(G n ) = l M +(G n ) x C Gn x (detG n y 

x exp |-^^tr(G n )|, 


[ 2 . 101 ] 


[ 2 . 102 ] 
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where Cc n is a normalizing constant: 

/ \ n(n+l)(2S 2 A ) 

( 27 r)-n(n-i)/ 4 ^\ 

_ V 2d A J 

{n^r^ + v)} 

<5.4 is a dispersion parameter given by: 


[2.103] 


[2.104] 


with 0 < 6 < 


71+1 
n + 5 


In addition, the matrix G n is positive definite and may be decomposed by 
Cholesky: 


G n = L n J Li 


[2.105] 


where L n is an upper triangular matrix taking its values on M„. With this 
decomposition, the elements of L n are independent and characterized by: 

- if j < j': L n jji = OnUjj,, where a n = 5 a{+i + l) -1 / 2 and Ujy is a real 
Gaussian random variable having mean equal to 0 and variance equal to 1; 

- if j = j ': L n jj = a nx /2V r where Vj is a real-positive random variable 
Gamma distributed, having as density of probability: 


PVj {v) 


1r+(«)- 


( n±l A- 4^2 



[2.106] 


2.3. Generating samples of random variables, random vectors and 
stochastic processes 

As mentioned in the beginning of the chapter, once deterministic and 
stochastic models (e.g. using PME) have been generated, the Monte Carlo 
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method can be applied in order to furnish information and statistics about the 
response of the system (Figure 2.1). 

Fiowever, the implementation of the Monte Carlo method involves the 
generation of samples, i.e. variates from a given probability distribution. For 
instance, assume that the PME has furnished a density of probability for the 
parameters chosen to be considered as random: the practical application of 
the Monte Carlo method needs the generation of variates from these 
distributions. 

So, we need to provide methods for the generation of random variables, 
vectors or matrices, as for stochastic processes and fields. 

This topic has been extensively treated in the literature, and a large number 
of works consider the construction of random number generator (RNG). In 
general, some qualities arc expected from an RNG: 

1) rapidity, for the generation of large numbers of variates; 

2) controlled reproducibility, in order to make possible comparisons and 
debugging; 

3) knowledge of its probabilistic and statistical properties; 

4) an extremely large period; 

5) pseudo-randomization, namely for the targeted applications. 

Early in the development of generators, many hybrid techniques mixing 
analog and digital systems have been used, for instance, electronic circuits 
generating white noise. However, these methods had shown some 
disadvantages: not so rapid as necessary and not easily reproducible, they 
also tended to produce bias and requested highly specialized equipment. 
Therefore, these techniques were replaced over the years. Today, virtually all 
generators arc based on algorithms. 

Since computers arc expected to be deterministic devices, it may seem 
impossible to use them as random generators. In fact, computer-generated 
random samples arc only pseudo-random (see [SHO 09]), since they are 
obtained through deterministic algorithms. However, they appeal - as random 
and reproduce random distributions - the main generators available on most 
of the computers have been subjected to rigorous testing in order to confirm 
their quality. 
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In [KNU 98], the author describes his personal attempt to build a random 
generator of random variables. He proposed an algorithm that worked with 
10-digit decimal numbers, generating a sequence of values {xi} t G N from an 
initial value. 

Despite the apparent complication, Knuth found that his algorithm could 
quickly converge to a fixed point. In addition, the tests showed that, even when 
using different initial values, the sequence of numbers quickly began to repeat 
itself (so, the period was relatively small). 

The algorithm of Knuth is an example that complexity in the construction 
of the algorithm does not guarantee that the resulting random generator will 
have all the expected qualities. Frequently, the complexity hides the period, i.e 
the rapidity with which the numbers generated repeat. Trivially, we can say 
that: random samples should not be generated by a method chosen at random 
(see [KNU 98]). 

Many methods for generating samples of random variables and vectors 
have been developed over the recent years. Examples include the popular 
generators based on linear congruence, the inverse transform method, or the 
Markov chains (Markov Chain Monte Carlo (MCMC)), among others. 

This chapter will discuss the construction of RNG. The readers interested 
in this topic may consult [ROS 06, DEV 86] and [RUB 08], 

We focus on the generation of samples from stochastic processes and fields, 
for which the generation process is a bit more complicated, since they are 
infinite-dimensional objects. In the following, we present the Karhunen-Loeve 
approach for the generation of realizations of stochastic processes. 

2.4. Karhunen-Loeve expansions and numerical generation of variates 
from stochastic processes 

The Monte Carlo method requests the generation of samples formed by 
variates from a random variable, vector or process: Monte Carlo approaches 
use random sampling as a tool to produce observations which may be used in 
order to perform statistical analysis or inference, with the aim of extracting 
information about quantities of interest (see [SHO 09]). 
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The generation of large samples as requested by the Monte Carlo approach 
may result expensive. The Karhunen-Loeve expansion provides a 
parameterization of a stochastic process that enables its approximation 
through a finite-dimensional process, i.e. by a random vector. Such an 
approximation can be used, for example, to reduce the computational cost of 
generating variates from the process under consideration. 


2.4.1. Karhunen-Loeve expansions 

Let us consider a stochastic process X, indexed by time and defined on a 
time interval T = ( a,b ). We denote by p x its mean and by C(s,t) its 
covariance function (see section 1.16). We consider the following eigenvalue 
problem: 

find (A, t/j) such that r/i / 0 and J C(t, s)^(s) d-s = A ip(t),\/t G T [2.107] 

The eigenvectors 'ip t arc normalized in order to have a mean square norm 
equal to the unity. Since C is positive, the eigenvalues arc always positive 
A i > 0, Vi G N. We assume that this problem has a countable set of solutions 
{(A iyipi) : i G N}, which are positive and may be ordered decreasingly : 
0 < Aj+i < A i, Vi G N. In general, the inequality is not strict (see, for 
instance, example 2.11). 

We define a sequence {Zj} igN>1 of random variables by 


Zi = \- [ [X t - li x {t)} AH) dt Vi G N. ■ 
M JT 


[2.108] 


We have 


E[Zi] = 0 and E[Z l Zj\ = Vi,j G N, [2.109] 

The Karhunen-Loeve expansion of X is: 


OO 

Xt = Px(t) + 

i= 1 


[ 2 . 110 ] 
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Thus, the Karhunen-Loeve expansion furnishes a particular way for the 
decomposition of a stochastic process into an infinite sum of time-dependent 
terms (the eigenfunctions 'ipj) and a sequence of random variables Z % . In 
practice, Karhunen-Loeve expansions arc approximated by finite sums 
involving d terms: 


d 

Xt ~ \/A j ipj{t) Zj. [2.111] 

i=i 

Obviously, such a truncation requests some reflection about the choice of 
d - the number of terms to be used in order to obtain a given quality. This 
question may be answered by considering one of the main characteristics of 
Karhunen-Loeve expansions: the value of X, also gives its relative contribution 
to the expansion: the smaller the eigenvalue, the smaller the contribution of 
the term (recall that both and Z, arc normalized). Since the eigenvalues 
arc arranged in decreasing order, the contribution decreases with i. Thus, we 
may evaluate the error by considering the mean square norm of the remaining 
eigenvalues i > d. Examples of applications can be found in [TRI 05, BEL 06, 
BEL 09, SAM 07, SAM 08, SAM 10, DOR 12, CAT 09, MAU 12], 

In the following, we illustrate Karhunen-Loeve expansions for some 
particular stochastic processes. 

Example 2.10.- Let us consider a stochastic process X t , indexed by the 
parameter t € T = [—b, b] and having as covariance function C the 
exponential one: 


C(ti, £ 2 ) = exp ^ — — — — ^ , a > 0. 


[ 2 . 112 ] 


In this situation, the eigenfunctions 'ip i and the eigenvalues A* arc obtained 
by solving equation [2.107], which reads as: 


exp— c\ti -t 2 \'ip i {t 2 ) dt 2 


Ai 


[2.113] 
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where c = 1/a. Equation [2.113] may be written as: 


exp —c(ti - t 2 )i>i{t- 2 ) dt 2 + / exp -c(t 2 — dt 2 = A* ipi(h). 


[2.114] 


By differentiating equation [2.114] twice with respect to t\ (and using 
some algebraic manipulations), we obtain a differential equation for t/y, which 
leads to the following eigenvalues A,; (see [XIU 10]): 


A* 


2 a 


1 + a 2 uJ\ ’ 
2 a 


1 + a 2 vf ’ 


for * = 2, 4, 6, • • • 
for i = 1, 3, 5, • • • 


and eigenfunctions ?/q (see [XIU 10]): 


[2.115] 


^i(f) 


• ( +M ih s[n ( 2uJ i b ) 

"M/f-— ■ 

sill 


for i = 2, 4, 6, • • • 

[2.116] 

for i = 1,3,5,--- 


where w, and n* satisfy the equations below: 


f au + tan (cob) = 0, for i = 2, 4, 6, - - - 

\ 1 — an tan (n6) = 0, for i = 1, 3, 5, • • • 


[2.117] 


In Figure 2.2, we exhibit the first 20 eigenvalues A i for different values of 
a. m 


Example 2. 11.- Let us consider a stochastic process X t , indexed by a 
parameter t defined by f 6 7 = [—6, b] and having as covariance function C 
the orthonormal one: 

C(t\, t 2 ) = 5(t\ — t 2 ). [2.118] 

In this case, equation [2.107] admits as a solution any family { ( Ay , V 7 *) : 
i € N}, such that : i <£ N} is a family of orthonormal functions and 
A i = 1, Vi € N. In this case, the eigenvalues arc not strictly decreasing. ■ 
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Figure 2.2. First 20 eigenvalues Xi determined for the exponential covariance 
function with different values of a 


Note 2.1. - Karhunen-Loeve expansion of Gaussian stochastic processes: a 
particular situation often found and, thus, having a particular interest is the 
one where both X t and {Z l : i £ N} are Gaussian. In this case, equation 
[2.110] corresponds to the sum of uncorrelated normalized Gaussian 
variables - thus, to independent Gaussian variables. So, the Karhunen-Loeve 
expansion corresponds to the approximation of a Gaussian process by sums of 
independent Gaussian variables with time-dependent coefficients. ■ 


2.4.2. Numerical determination of Karhunen-Loeve expansions 

In the previous examples, we have considered analytical expressions of the 
terms involved in the Karhunen-Loeve expansion, derived from the analytical 
expression of the covariance function of the stochastic process X. In this 
section, we consider the situation where such an expression is not available 
and we have only observations connected to a time discretization. 

For instance, let us assume that T = (a, b) and consider that a finite number 
n of time instants is given: r = (G, • ■ ■ , t n ) 1 such that a = G < G < 

£ L b. In practice, data arc often sampled at fixed time intervals and we 
have tj. pi — tj = A t, A t = (b — a)/{n — 1). 
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For each tj in this set, we consider Xj = X t . : associated with the time 
instants, we have the random variables X\, X2, • • • , X n . A realization of the 
stochastic process at these instants consists of a vector of variates: X m = 
((Xi) m , • • ■ , (X n ) m ) t , where 1 < m < M is an integer that indexes the 
realizations: M is the number of available realizations. 

For M and n large enough, approximated realizations of X t , as estimations 
of the mean, variance, autocorrelation and covariance of the stochastic process 
X, may be obtained (see [WIR 95]). For instance, we may use an interpolation 
of the discrete values X m in order to generate an approximated generalization 
X rn (t). An example is provided by the simple linear interpolation 

X m (t) « + tj < t j+1 . [2.119] 

tj + 1 t j 


The mean fi x may be estimated by fi x defined as follows: 

M 

P-xifj) = s X j = V • ■ [2.120] 

m— 1 

These estimates generate a vector fi = (fi x (ti), ■ ■ ■ , /( y (^)) 4 , which is an 
estimation of // T = (jj: X (t \ ), • • • , fi x (t n )) t . Analogously to the realizations of 
the stochastic process itself, an estimation fi x (t) is obtained from these values 
by using these discrete values in order to generate a function defined on T; for 
instance, we may use an interpolation or other approximation procedures. 

An analogous procedure may be used in order to obtain an estimation a x 
of a x ( see 1.27): 

1 M 

M E ((*i)m - Axfe )) 2 3 = 1, • • • ,n. [2.121] 

771=1 

Analogously to the mean, an estimation a \ it) is obtained from these values 
by using these discrete values in order to generate a function defined on T. 
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The situation is entirely analogous for the autocorrelation or the covariance. 
We generate 


m = 1 


[2.122] 


C(tj,t k ) = ((^) m -Axfe)]) • (™ m -A.v(4)). [2-123] 

ra=l 


These quantities may be used for the construction of continuous functions 
R(s, t ) and C(s, t), which estimate R(s, t) and C(s, t), respectively. 

2.4.2. 1. Estimating the eigenvalues and eigenfunctions of the covariance 
matrix 

In practice, we may determine estimations of the eigenvalues and 
eigenvectors of the covariance matrix without going through the construction 
C(s,t ) defined in the previous section: let us represent the M realizations 
X m , 1 < m < M by using two n X M matrices X and fi, such that the jth 
column of X is Xj and all the columns of /x are equal to /x T : 


X — ( Xi X 2 • • • Xm ) , /x — ( /x T /x T • • • /x T ) ; 


[2.124] 


/(Xr), (X!) 2 ... (X!) m \ 


X = 


V = 


\(UW 2 - (■ x n ) M J 

Then, we consider 
Xq = X - /x. 


( fj-xih) ••• 


\L l x(tn) ' ' ' A x(tn) ) 


. [2.125] 


[2.126] 


and may estimate the covariance function C(s, t) by using the matrix 



168 Uncertainty Quantification and Stochastic Modeling with Matlab® 


which gives the estimates of C (t,. tj) for pairs (t tj) from r: 

C(ti,h) ■ ■ ■ C(ti,t n ) 

C = : : . [2.128] 

_C(t n , tl) ■ ■ ■ C(t n ,t n )_ 

This matrix is used as follows: equation [2.107] is approximated as 

Ctp At = A-0, [2.129] 

The solution of equation [2.129] provides n eigenvalues A* and their 
associated eigenvectors -t/^ (normalized to the unity norm). These quantities 
estimate the n first eigenvalues A* and the values 
■ ■ ■ , which may be used for the construction of an 

estimate 'ip,j of : ip t on T, for 1 < i < n (for instance, by using an interpolation 
procedure). 

Example 2.12.- Let us consider the stochastic process: 

X t = A 1 t + A 2 , [2.130] 

where A = (A\, A 2 Y is a Gaussian vector having the mean and covariance 
given by: 

W}= 2 [ Ca 1= 1/2 l \ • [2 - 131] 

In this situation, X t is a linear combination of the components of a Gaussian 
vector - thus, X t is a Gaussian variable and, so, the process is Gaussian. As 
a result, for any tj € T, the probability density p 3 = p tj is Gaussian. Such a 
process may be represented as follows: 

X t = a t A;a=r*V [2.132] 

We have: 


t l x(t) — at f'A — t + 2 


[2.133] 
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°A'(0 = c^Cac* = t 2 + t + 1. 

As a result, the probability density /y ; is known : 


[2.134] 


(x-(tj+2)) 2 

p tj (x) = e 2(t ^ +1) , [2.135] 

V2tt y t'j ~\~ tj 1 

Consequently, this process is first-order defined. 

Using the probability density given in equation [2.135], we may generate 
realizations: for instance, we have generated 10 4 realizations on the interval 
T = (0, 10), with A t = 0.1. These data have been used for the estimation C 
using equation [2.127] and for the determination of the approximated 
eigenvalues which satisfy equation [2.129]. 

The results arc shown in Figure 2.3. ■ 



Figure 2.3. Eigenvalues A; estimated from the covariance matrix 
C of the stochastic process [2.130] 


Example 2.13.- Let us consider the stochastic process: 


X (t, u) = A\ cos (ut) + A 2 sin (ut), 


[2.136] 
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where A = (A\, A 2)* is a Gaussian vector having the mean and covariance 
given by equation [2.131]. Analogously to the preceding example, X t is 
Gaussian and the process is Gaussian. We have: 


X t = cA A ; a. = 


f cos (c ot) 
y sin (cot) 


[2.137] 


In this case, the mean and the covariance are: 


H x (t) = c^Va = cos (wt) + 2 sin (cot) [2.138] 

cr 2 x(t) = ^Cao = cos 2 (cot) + cos (cot) sin (cot) + sin 2 (cot). [2.139] 


Here, the probability density p tj is: 


(x-M X (tj)) 2 

Ptj(x) = , e 2{T 'x (t i ) [2.140] 

VZTTsJcr'xttj) 

and this process is also first-order defined. 

As in the preceding situation, the probability density given in equation 
[2.140] has been used in order to generate 10 4 realizations on the interval 
T = (0, 10), with A t = 0.1. The data collected have been used for the 
estimation C using equation [2.127] and for the determination of the 
approximated eigenvalues which satisfy equation [2.129]. 

The results are shown in Figure 2.4. ■ 

2. 4. 2. 2. Using the estimations for the generation of an approximated 
Karhunen—Loeve expansion 

Once the estimates of A* and ip 7 n are obtained, we have to generate the 
elements Z % to be used in the Karhunen-Loeve expansion. Namely, we must 
determine their distributions of probability. From equation [2.108], we may 
generate an approximated realization of Z, by using: 

(^i)m = (X m — /4) , ('fiijr >Rt 


[2.141] 



Maximum Entropy and Information 171 



Figure 2.4. Eigenvalues Ai estimated from the covariance matrix C of the 
stochastic process [2.136] 


We may collect the realizations of Z* in a vector Zi = ((Zj) l5 • • • , (Zj) M ) 4 . 
We have: 

Zj = i-Xo* . (^i)r, [2-142] 

where the matrix Xo is defined in equation [2.126] and has dimension 
(n x M). This equality shows that the samples X m , 1 < m < M, provide M 
realizations of Z % \ for M large enough, we may use these values in order to 
determine the distribution of Z,. For instance, we may generate a histogram in 
order to get an approximate probability density. 

Once this step is performed, we may consider an approximate Karhunen- 
Loeve expansion involving d terms (d < n), given by: 


X. 


(AL) = £(*) + v A, ipi(t) Zi 


[2.143] 


Thus, for a given stochastic process X, the procedure consists of 
generating a sample X; estimating the mean jl and the covariance matrix C 
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(equations [2.120] and [2.123] or [2.127]); determining the eigenvalues A,; 
and their associated eigenvectors (ip\) T (equation [2.129]); generating the 
continuous approximations, the continuous i\) i and /7; finding the variables Z, 
and their probability densities (for instance, by using equation [2.142]). 


Example 2.14.- Assume that we arc interested in the generation of 
realizations from the stochastic process X, indexed by the parameter t G T = 
(0, 10), such that: 

1) X 0 = 0; 


2) Xt, t > 0, is a stochastic process that has independent increments; 


3) the increments Y = X t2 — X tl , with t -2 > t\ > 0, arc Gamma 
distributed, have a mean ji Y = mfo — t\) and a coefficient of valuation 


S Y = 


=), with m > 0 and 6 > 0 given. 


We have generated the approximated Karhunen-Loeve expansion by using 
the procedure indicated and a sample of 10 4 variates from X on T = (0, 10), 
obtained with different values of At. 

The results obtained for the first 20 eigenvalues A* are shown in Figure 2.5. 



Figure 2.5. Estimations of the eigenvalues A; of the matrix C for different 
values of At. Fora color version of the figure, see 
www. iste. co. uk/souzadecursi/ 'quantification, zip 
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The estimated probability densities of the variables Z % arc shown in Figure 
2.6 - they were obtained by using the histograms associated with equation 
[2.142], 


1” Ranrinm Variahlp Ranrlnm \/ariahlo 




Figure 2.6. Estimations for the densities of probability of the first four random 
variables Zi in the Karhunen—Loeve expansion. For a color version of the 
figure, see www.iste.co.uk/souzadecursi/quantification.zip 


Then, samples from Z, were generated by using the MCMC method, which 
provided a sample from X. 
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We have studied the influence of d from two different approaches. In the 
first approach, the covariance matrix Cd has been estimated from the samples. 
Then, we have considered the error 


Error = 


Cd 

Fr 

- 

c 

Fr 


c 

Fr 


This expression has been evaluated for different values of d. The results are 
shown in Figure 2.7. As we can observe, the error [2.145] decreases when d 
increases. 



d 

Figure 2.7. Error [2.145] for different numbers of terms d 


In the second approach, we compare, for a given fixed t, the probability 
density of Xt and the histogram of x [ K L ' ) . 

Figures 2. 8-2.9 exhibit results for t = 2 and d = 1, 2, 5, 10. We observe 
that, when d increases, the histogram of x[ K L] becomes closer to the correct 
result px 2 ■ ■ 




Maximum Entropy and Information 175 



Figure 2.8. Densities p X ( t= 2 w ) an d histograms of X KL (t = 2, uj) for d = 1 
and d = 2 terms. For a color version of the figure, see 
www. iste. co. uk/souzadecursi/ quantification, zip 


KL: d =5 KL: d =10 



Figure 2.9. Densities P x ^ t= 2 u ) an d histograms of X KL ( t = 2, uj) for d =5 
and d = 10 terms. For a color version of the figure, see 
www. iste. co. uk/souzadecursi/ 'quantification, zip 
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Representation of Random Variables 


When considering random variables, the terminology representation is 
generally used in the sense of approximation. Naively, the problem of 
determining a representation for a random variable X using another random 
variable U may stand as follows: 

Model Problem 3.1 (informal).- Let X and U be two random vectors 
(eventually having different dimensions) on the probability space (Q, P ). Let 
S / 0 he a set of functions of U. Determine an element PX from S such 
that PX is the best approximation of X on S, i.e. PX is the element of S 
that is the closest 1 to X. ■ 

The main objective of a representation is to use PX instead of X in a 
procedure of design, in the evaluation of the probability of crucial events. 
Ideally, we desire to get a representation (i.e. an approximation) such that 
X — PX is equivalent to 0 (i.e. it is negligible and we have X = PX, almost 
everywhere (a.e.)). In general, this aim cannot be achieved and we look for an 
approximation such that PX may replace X in practice, i.e. such that the 
difference X — PX is small enough in a sense corresponding to our purposes. 

It is interesting to note that this formulation leads to an optimization 
problem, since we must determine the element of S that is the closest to X 
according to a proximity criterion: this is generally achieved by introducing a 
function that measures the distance between the random variables and 
determining its minimum on S. In order to obtain a precise formulation, it is 
necessary to introduce formally precise definitions of subspace S and of the 
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distance - this is also necessary in order to provide algorithms for the 
numerical approximation. 


3.1. Approximations based on Hilbertian properties 

When considering square summable variables (X £ [L 2 (n,P)] k and 
S C [L 2 (Q. P)] k ), we may exploit the Hilbertian structure of this set (see 
sections 1.6, 1.11) in order to characterize the representation PX as an 
orthogonal projection (see definition 1.5 ). In this case, the scalar product and 
the norm arc those of [L 2 (17, P )] k : 

(Z, Y) = E (Z.Y) , ||Z|| = \J (Z, Z). 

The orthogonal projection is the solution of the following problem. 

Problem 3.1.- Let X = (X\, . . . , X^) and U = (Ui , . . . , U m ) be two 
random vectors taking their values on and M m , respectively. Let 5/0, 
5 C [ L 2 (f 2, P)] k be a set of functions of U. Find 

Z £ 5 and ||Z — X|| = min{||s - X|| : s £ 5} . ■ [3.1] 

The error in the approximation is measured by the distance between X and 
5: dist (X, 5) = ||X-PX||. 

As previously observed, we have to solve an optimization problem: find 
the argument Z that minimizes || Z — X || on 5. With this formulation, the 
theoretical framework about orthogonal projections may be applied, namely, 
the main fundamental result given by the orthogonal projection theorem and 
its consequences (see proposition 1.8). For instance, if 5 is a closed linear 
subspace 5, then PX is uniquely determined and the numerical determination 
of PX may be performed by using the proposition 1.9. We have: 

PX £ 5 and X-PX1 5, 
so that: 


PX £ 5 and (X - PX, s) = 0, V s € 5. 
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Indeed, if dim(S) = n, we may consider a basis 0 = {e* : 1 < i < n} 
of L 2 (17, P) of the linear space S and take unknowns as the coefficients of 
the expression of PX in this basis. In the case of a closed subset S, the 
representation is exact (i.e. X — PX = 0) if and only if X G S (since 
X = PX € S). The last equation may be used in order to generate equations 
leading to the complete determination of the coefficients of PX, by taking 
successively s = e* for i = 1 , ,n. 

In practice, S is often generated by considering a basis 
<B = {ip i : 1 < % < Nx} (Nx > 0) of L 2 (17, P ) (instead [L 2 (17, P)] k ) and 
taking S as being the subspace of dimension n = k x Nx given by 

S=[{ Vl (U),..., VA , i .(U)}] t = 

{z = ES (U) : z. 6 R*, 1 < i < Nx] , 

Then, an element Z(U) = z 'i i 'P% (U) is defined by the coefficients 

z = (zi, . . . , , z n ), which may be ranged in a matrix z € M ( Nx , k) having 


the coefficients of zj as j — th line, i.e.: 

z = (Zij) , Zj = (Zji, ■ • • , Z ik ) G Rt [3-3] 

Let us introduce: 

^(U) = (^(U),...,^(U)). [3.4] 

Then, we have: 

Z( U) = v>(U).z, [3.5] 

E n x 

Xi ip; G 5, we may also write: 

i=l 

PX(U) = <p(U).x, [3.6] 

where x G Tf (TVx, fc) has the coefficients of Xi as j — th line, i.e.: 

x = ( Xij ) , Xi = (xn , . . . , G R k . [3.7] 
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The unknowns to be determined are the coefficients forming x. Since 
X - PX_LS, we have: 

(X - PX, Z) = 0, V Z G s. 

This equation reads as: 

E ((X - PX)'Z) =0,VZG S, 
so that: 

E ((PX)'Z) = E (X f Z) , V Z G S. 

By using equations [3.5]— [3.6], we obtain: 

E z) = E (X*<£>z) , V z G z G fM (k, Nx) , 

which shows that: 

x* E z = E (X f </>) z, V z G z G M (k, Nx) , 

Since z is arbitrary, we have: 

x*E (<pV) = E (XV) • [3-8] 

Equation [3.8] forms a linear system that may be used in order to determine 
the coefficients x. Examples and the Matlab implementation of this approach 
arc given in the following. 

If S is not a finite-dimensional subspace, but it admits a Hilbertian basis 

r i X — "5 +°° 

7 = {ej : % G N*}, we may still consider PX = 2_ . u i e i an d 
determine the coefficients u = (■ ui,U 2 , . . .) G M°°. From the mathematical 
standpoint, all the coefficients may be determined, but the approximation is, 
in practice, limited to a maximum number of terms n, which corresponds to 
the situation where S' is a finite-dimensional subspace. This situation extends 
to the case where J is a countable total family. 
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3 . 1 . 1 . Using the conditional expectation in order to generate a 
representation 

As previously observed in sections 1.6.5 and 1.11.1.4, the most general 
approximation of X as a function of X is provided by the conditional mean or 
conditional expectation. In this case, S i.e. taken as the set of all the functions 
of U that are square summable: 

S = { Z : Z = tp (U) , ip : — > R q , Z G L 2 (ft, P)} 

The solution is the conditional mean of X with respect to U (sections 1.6.5 , 
1.11.1.4), PX = E(X |U) = g (U), given by: 

g(u) = J xf (x I U = u) du, 

where / (x | U = u) is the conditional distribution of X with respect to u: 

/ (x | U = u) = / (x, u) / f f (x, u) du = f (x, u) / /x (x) . 

Jr 

Here, / is the density of the pair (X, U) (i.e. P(x € dx, U € du) = 
/ (x, u) dxdu). The conditional mean provides a lower bound for the error 
||X — PX||: E (X |UX ) is the best approximation of X by a function of U 
in the sense of the norm ||*||, so that ||X — <p (U)|| > ||X — E (X |U)|| for 
any approximation Z = p (U) € S. 

Example 3.1.- Let us consider f) = (0,1) and P((a,b)) = b — a 
(i.e. P (du) = i (du) where i is the Lebesgue measure). Let U (u>) = y/uj 
and X (uj) = lo 2 . The cumulative function of the pair (U, X) is: 

F (u, x) = P (U < u, X < x) = P (y/uj < u, uj 2 < x) 

= P (u < U 2 , UJ < y/x) , 

so that: 

F (u, x) = min {n 2 , y/xj . 

The joint probability density is: 

/( “- x) = a^ F(u - x) ■ 
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This derivative must be evaluated in its variational form: at a given point 
(u, x ), we consider the ball IL of radius e > 0 and the set D ( IL ) formed by 
the compact support functions B e . Then, we have: 

= / B , F a dmlx ’ v * 6 D (B '> • 

Let: 

Bf = { (u, x) € B £ : u 2 < y/x] ; Bf = { (u, x) € B e : u 2 > yfx } ; 

£ = {(«, x) : u 2 = yfx, 0 < u, x < l} , = {(it, x) € B £ : u 2 = \/x} . 

We have: 

(it, x) € / (it, x) = 0 

and, in an analogous manner: 

(it, x) € Bf =t> / (it, x) = 0 

so that / is concentrated on the curve S: we have f = A(u, x) r)V : . 

So, by using that it 2 = a/x it 4 = x, we have: 

/ (x | 17 = it) = A (u, x) 6 E (u, x) / [ A (it, x) 5j;dx = 

it 


f 0, if x f it 4 , 
\ 1, if x = n 4 . 


Thus: 

£(X|[/ = n) = n 4 and£(X|[/) = U 4 . 

A (it, x) can be determined: indeed, let us introduce the unitary normal 
oriented outwards the region Bf : 



N u = 2 u, N x = 


— N u 

V N u+ N x ’ 

1_ 

2y/x' 


N x 

y/N^+Nf 
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We have, at any point (u, x) € X: 


F 4> dudx = / u 2 (j) + / \/x < 

du ox J B < du ox J B > du ox 


/ ~ 4> = div u 

J B e (u) OUOX J Be (u ) V 

so that (Green’s theorem): 


L e (u) U dudx^ X £ («) U 


■n- N u [ , .n , 


/ ^ o = / [n M u 2 <90/3x - n x iV u c/>] 

J B e (u) UU Ox is e (u) 

In an analogous manner: 


y/x-——(j) = div K/x 

E (u) Jb e {u ) V 


and we have: 


3 2 f 

\[x (j) = - [n u y/xd(j)/dx 

du dx y Se(u ) 


Thus, we also have: 


< 9 2 f 

F — — — <j) dudx = / n u (u 2 — \fx)d6l dx + n u N x cj) 

dudx J E C ^ 


n u N x (j) = n u N x 5j; (</>) , 


and A = = n x N u . m 
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Example 3.1 is implemented by a Matlab program that generates a sample 
of the pair ( U , X) and uses this sample in order to determine, on the one 
hand, the empirical cumulative distribution function and, on the other hand, 
the associated density. These quantities arc obtained by using the programs 
presented in section 1.5. The results obtained from two samples of 10,000 
variates are shown in Figure 3.1 below. The first sample is random, obtained 
by using rand, while the second is composed of equally spaced values of u. 
The value of parameter h is h = 0.02 that coincides with the step used in the 
equally spaced abscissa. 



Figure 3.1. Empirical density from a sample of 10,000 realizations of the pair 
with h = 0.02. For a color version of the figure, see 
www. iste. co. uk/souzadecursi/ 'quantification, zip 


3.1.2. Using the mean for the approximation by a constant 

The simplest approximation for the random variable X consists of 
determining a constant corresponding to the element of the linear subspace: 

S = jz G L 2 (fl) : Z is constant: Z (tu) = s£ Vw G flj 

which is the closest to X. As shown in the preceding (section 1.11.1.2), the 
solution is the mean or expectation of X: PX = m = E (X). The error 
dist (X, S ) in the approximation is the standard deviation of X. 
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Example 3.2.- Let us consider fl = (0,1) and P{{a,b )) = b — a 
(i.e. P {du) = i ( du ), where i is the Lebesgue measure. Let U (u) = y/u 
and X {u) = u 2 . We have: 


E(U)= / U{u)P {du) = / yfudu = - 


and 


E 


1 

{X) = J X{u)P {du) = J u 2 du = 


The variances are: 

l 

V (U) = J {U{u)-E{U)) 2 P{du) = j 
n o 



1 

18 


and 


V{X) 


J {X{u)-E{X)fP{du) = j 
n o 




4 

45 


The result may also be obtained by using the probability density of the 
variable U. Lor instance, the cumulative function of U is: 


T 0, if u < 0 

Fu{u) = P{U <u)= < w 2 ,if0 < u < 1, 
I 1, if u > 1 


so that its probability density is: 


fu(u) 


0, if u < 0 or u > 1 
2u, if 0 < u < 1 


Thus: 


E (U) = j ufu{u)du = J 2E 


du 


2 

3 
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and 

V (U) = J fu(u)du = J 2 U (u-^J 2 du=^ 

o o 

In an analogous manner: 

( 0, if x < 0 

Fx(x) = P (X < x) = < y/x, if 0 < x < 1 , 

\ 1, if x > 1 

f 0, if x < 0 or x > 1 
fx(x) = ( 1 if0<x<1 • 

L 2V5’ 

Thus: 

i i 

E (X) = J xfx{x) = J 1 y/xdx = J 
0 0 

and 

V(X) = ]{ x ~t) fx(x)dx = 1 ix = Ts- a 
0 0 


Example 3.2 may be implemented in Matlab by using the programs 
introduced in section 1.10.1. 


3.1.3. Using the linear correlation in order to construct a representation 

As shown in section 1.11.1.3, the best approximation of X by an affine 
function of U is the element of the linear subspace: 

S= {s e L 2 [(Q)] k : s = aU + /3;a € M{k,m),f3 € M fc } , 

which is the closest to X. The solution is X = AU+B, where A is determined 
by solving, for i = 1, ... ,k: 


C.a = b 
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with a = G fM(m, 1) such that aj = A t] , C G M(m,m) 

such that C r j = Cov ( Uj , U r ) and b = (b\, . . . , bmf G tM(m, 1) such that 
b r = Cov ( Xi , U r ). After the determination of A, B is given by: 


B = X — A.E (U) . 


In the situation where k = m = 1, we have X = aU + 6, with: 
Con {X, U ) 




; b = E(X)-aE(U). 


and the error may be determined by using the linear correlation coefficient: 
Cov (X, U) 


p(X,U) = 


y/V(X)V(U) 


We have: 


dist (X, 5) = yjv (Y) (l ~[p{X,Y)f). 

Example 3.3.- Let us consider = (0,1) and P((a,b)) = b — a 
(i.e. P (■ dut ) = i ( dut ) where l is the Lebesgue measure). Let U (uj) = \Ju 
and X (uj) = uj 2 . We have: 

Z(V) = 1,V { U) = Pe(X)=1,V(X) = ±. 

In addition: 

t 

E (UX) = J U(uj)X (uj) P (dw) = J uj 2 yJujdu: = 
n o 

so that: 

Cov (U, X) =E(UX)-E(U)E(X) = 

Thus: 


Cov (U, X) _ 4/63 _ 8_ 
V(U) “ T7l8 “ 7’ 


a = 
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b = E(X)~ 
p(U,X) = 

II X-aU - 


„, TT , 18 2 3 

aE ([/) = x - = — 

K ’ 3 7 3 7 


Cov (, U , X) _ 2y/l0 


0.903508; 


VV (U) V (X) 7 

III = \jv(X) (l -\p(U,X)} 2 ) = 


2 

7\/5 


0.127775. 


Indeed, if: 

J (a,b) = j ( X (w) — aU (u) — 6) 2 P (du) = j (u 2 — ay/u — 6) 2 du, 
we have: 

J (a, b) = - + — — — + b 2 + — a(— 3 + 76), 
so that: 

8J 4, 4 

and 

8J 2 , 

— --(2a + 36-1). 

Thus, the minimum of J is attained in (a, 6), solution of the linear system 
is: 

4, 4 

a 4 — 6 = - ; 2a + 36 = 1. 

3 7 

So, a = | and 6 = — |. Let us note that: 

l 

E (U 2 ) = J U(u) 2 P (du) = J udu = * 

f2 0 
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while: 

E(U) = 2 -,E{X)= 1 -,E{UX) = 2 -. 

Thus, the equations 

aE (U 2 ) + bE (U) = E (UX) ; aE (U) + b = E {X) 
may be written as: 

1 2, 2 4, 4 

-a H — b = - a H — b = - 

2 3 7 3 7 

and 

2,1 

—a ~h b — — < > 2a T 36 — 1. H 

Example 3.3 may be implemented in Matlab by using the programs 
introduced in section 1.10.2. 


3.1.4. Polynomial approximation 

A classical (and often useful) expansion consists of approximating X by a 
polynomial function of U. In this case, we must consider a basis 
‘B = {<^j(U): 1 < i < Nx } containing Nx linearly independent polynomials 
in the variable U (such as, for instance the polynomials 1; U r for 1 < i < m; 
U r U s for 1 < r, s < m, etc.) and, as previously introduced, take as unknowns 
the coefficients x = (xi, . . . , x n ), with X{ € of the expression of PX. For 
instance, when we are interested in a polynomial where the maximal degree 
in component Ui is di — 1 > 0, we set d = (di, . . . , d m ), a = {a\ , . . . , a m ) 
and we consider: 

A(d) = {a € M m : 1 < (x < d (i.e. 1 < ctj < di for 1 < i < m) } 


<B = |[/" 1 _ 1 C/ 2 Q 2 _ 1 ...C /" m_1 : a G A(d)} 


and 
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We observe that, in this case, term U t may take any exponent going from 
0 to di — 1 so that Nx = Hi= i di • I n order to apply the procedure exposed, 
it is necessary to introduce a map that indexes the elements ct € ,4(d) into a 
single index. For instance, we may consider Morton ordering or, more simply, 
an index map given recursively by: 

index(a i, . . . , a m ) = index(a \, . . . , a m - 1 ) + (a m 1) 

*index(di, . . . , d m - 1 ), mdex(ai) = ai. 

Such a map is implemented in Matlab as follows: 

Listing 3.1. Index map 

function ind = index_map ( alfa ,d) 
if length (d) == 1 
ind = alfa ( 1 ) ; 
elseif length(d) == 2 

ind = alfa ( 1 ) + ( alfa (2) -l)*d(l) ; 

else 

ind = index_map ( alfa ( 1 : end — 1) ,d ( 1 : end — 1 ) ) + (alfa(end) — 1) 
*index_map (d( 1 : end — 1) ,d( 1 : end — 1)) ; 

end ; 
return ; 
end 

function alfa = inverse_index_map ( ind , d) 
alfa = zeros ( size (d) ) ; 
i i n d = ind ; 
if length(d) > 1 

for j = 2: length(d) 

jj = length (d) - j + 2; 
m = index_map ( d ( 1 : j j — 1) ,d(l : jj — 1) ) ; 
alfa(jj) = floor ( iind /m) + 1; 
ii = mod( iind ,m) ; 
if ii > 0 

iind = i i ; 

else 

alfa ( j j ) = alfa ( j j ) - 1; 

iind = iind — (alfa(jj) — l)*m; 

end ; 

end ; 

end ; 

if iind > 0 
alfa ( 1 ) = iind; 
else 

alfa (1) = d ( 1 ) ; 
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alfa (2) = alfa (2) - 1; 

end ; 
return ; 
end 


By using these maps, we associate to a multi-index (an, . . . , a m ) an unique 
index 0 < i < Nx and conversely. Let us introduce 

(Pi(U) = ,i = index(a i, . . . , a m ) . 


Then, 

( n x 

S= se [L 2 (0)] fc :s = ^s i ^(U);s i €M fc 
l i = t 

so that% and 

N x \ 

U),<A(U)j = (X^(U)), V l<i<N x . 

This last equation provides a linear system for the determination of the 
coefficients x,. 

A particular situation of interest is the one where U is a Gaussian vector 
(see section 1.13.4. A vector is said to be Gaussian if, for instance, all its 
components U t are Gaussian and independent). In this case, the approach may 
be connected with Wiener’s one (see [WIE 38, CAM 47, GHA 91]), which has 
generated the approximation by polynomial chaos. 

This approach may be implemented in Matlab by using the programs of 
section 3.1.5. 

Example 3.4.- Let us consider S7 = (0,1) and P((a,6)) = b — a (i.e. 
P ( du ) = i (du) where i is the Lebesgue’s measure). Let U (cj) = s/lo and 
X (u) = uj 2 . In order to approximate X by a polynomial function of U, 
having degree d, we must determine: 

d 

PX = Y, 

3 = 0 
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where the d + 1 coefficients .x'o, x \ , x n verify: 

^ J2 xjU j , j =(X,U i ), 0 < i< d, 


J2 XjE ( U i+j ) = E {XU 1 ) , 0 < i<d. 

3=0 

By setting X = (xo, ..., Xd), we have: 

AX = B, 
where: 


l 

Aij = E (I U i+ i ) = j [U{uj)] i+j P {dw) = j J i+j V 2 du 

n o 

2 

= , 0 < i, j < d 

2+i+j ~ 


and 


i 

Bi = E {XU 1 ) = J X{u) [U{u)YP{du) = J J i+4)/2 du 

n o 

2 

= , 0 < i < d. 

6 + 1 

The solution provided by Matlab for different values of d is shown below: 


\d 

X j 

K) 

(0.3333) 

n 

(-0.4286, 1.1429) 

V2_ 

(0.2143, - 1.4286, 2.1429) j 

p 

(-0.0397, 0.4762, - 1.6667, 2.2222) 

p 

(0.0000, - 0.0000, 0.0000, - 0.0000, 1.0000) 

p 

(0.0000, - 0.0000, 0.0000, - 0.0000, 1.0000, - 0.0000) 


(0.0000, - 0.0000,0.0000, -0.0000,1.0000, - 0.0000,0.0000) 
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We observe that: 

- for d = 0, it coincides with the approximation of X by a constant: the 
exact result is PX = 5 ~ 0.3333, 

- for d = 1, it is equivalent to approximate X by an affine function of U: 

the exact result is PX = — | + |(7, which corresponds to X = (— ~ 

(-0.4286, 1.1429). 

- for d > 4, the result is PX = U 4 = E ( X \ U), which corresponds to 
the exact 1 . 


3.1.5. General finite-dimensional approximations 

A usual situation is that where S is a finite-dimensional subspace, i.e. a 
subspace having a basis formed by a finite number n of elements: 

F = {p 1 (U),...,p n (U)} 

such that, for k = 1 (extension for k > 1 is obtained by applying the results 
below to each component): 


S = ^ s € L 2 (fi) : s = ^2 s i l Pi (U ) ; Si € M, 1 < i < n 
i = 1 


In this case: 

n 

PX=J2 XjVj (U) € 5 
3 = 1 


and 


x iVj (U ) , ^ (U) 




(X,Pi(U)), 1 <i<n. 


Thus, X = (xi, ..., x n Y is the solution of the following lineal - system: 

AX = B, Aij = fa (U ) , pi ([/)) , Bi = (X, pi ((/)) . 
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We observe that this approximation is linear: is a € R, we have 
P (X, + oX 2 ) = PX i + aPX 2 . Indeed, if: 

(Bi)i = (X ls Vi (U)) , (B 2 \ = (X 2 , ^ (U)) , Bi = (B 1 ) i + a (B 2 \ 

and 

AX! = bi, AX 2 = B 2 , 

then: 

X = Xi + aX 2 
satisfies: 

AX = AXi -f- oXX 2 = B! -|- aB 2 = B, 
and we have P (Xi + aX 2 ) = PX\ + aPX 2 . 

If the basis is orthogonal, i.e.: 

{Vi (U) , Vj (U)) = 0, if i / j, (Vi (U) , Vi m > 0. 

then: 

Aij = 0, if t / j, An > 0. 

In this case, the solution is given by: 

Xi = Bit An = (X, Vi (U)) / (Vi (U) , Vi ( U )) , 1 < i < n. 

If the basis is orthonormal, i.e.: 

then: 


Xi = Bi = (X, Vi (U )) , 1 < i <n. 
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We observe that any basis G = 0 n (I7)} may be 

orthonormalized using the Gram-Schmidt’s orthogonalization procedure: 

0i = V’i ; <pi = 0i/ Il0i II ; 


k - 1 

k > 1 : 0 fc = ip k ~ (0fc» V>i) ¥>i ; ¥>* = 0*/ 110*11 • 

i = 1 

This approach may be implemented in Matlab as follows: assume that, on 
the one hand, ipp(U) is provided by a subprogram phi(k,U) while, on the 
other hand, the scalar product of two random variables Y and Z is provided 
by a subprogram scalprod(Y,Z). For instance, Y = Y(u), Z = Z(u) and 
the density of c o is f u (u), defined on (a, 6), scalprod(Y, Z) must return the 
value of J^Y(uj)Z(u)f UJ (uj)du!. In this case, the coefficients x arc provided 
by the code 

Listing 3.2. Scalar products calculated by integration 

function x = expcoeffphi, N_X, X, U, f_omega , a , b ) 

% 

% determines the coefficients of the expansion. 

% by integrating the variables 
% 

% IN: 

% phi : basis function phi(k,U) — type anonymous function 
% N_X : order of the expansion — type integer 

%X : the random variable to approach — type anonymous function 
% U : the random variable argument of the expansion — type 
anonymous function 

% f_omega : density of omega — type anonymous function 
% a,b : bounds of integration — type double 


% OUT: 

% x: table 1 x N_X of coefficients — type array of double 
% 

[A, B] = v ari a t io n al_m a tri c e s ( phi , N_X, X, U, f_omega , a , b ) ; 
x = A \ B; 

x = x ’ ; % transpose to 1 x N_x vector 

return ; 
end 

function [A. BJ = variational_matrices (phi , N_X, X. U, f_omega , 
a , b) 
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% creates the matrices of the linear system by using 
% by integrating the variables 
% 

% IN: 

% phi : basis function phi(k.U) — type anonymous function 
% N_X : order of the expansion — type integer 

% X : the random variable to approach — type anonymous function 
% U : the random variable argument of the expansion — type 
anonymous function 

% f_omega : density of omega — type anonymous function 
% a,b : bounds of integration — type double 

% 

% OUT: 

% A: table N_X x N_X of scalar products — type array of double 
% A(i ,j ) = ( phi ( i ,U) , phi (j ,U) ) 

% B : table N_x x 1 of scalar products — type array of double 
% B(i) = (X, phi ( i ,U) ) 

% 

A = zeros (N_X, N_X) ; 

B = zeros (N_X, 1) ; 
for i = 1: N_X 

Y = @(om) phi ( i ,U(om) ) ; 

A(i,i) = scalprod (Y, Y, f_omega , a , b ) ; 

Z = @(om) X(om) ; 

B(i) = scalprod (Y, Z , f_omega , a , b ) ; 
for j = i + 1 : N_X 

Z = @(om) phi (j ,U(om) ) ; 

aux = sc alprod ( Y, Z , f_omega , a , b ) ; 

A( i , j ) = aux ; 

A(j , i ) = aux ; 

end ; 

end ; 
return ; 
end 

function v = scalprod (Y,Z , f_omega , a , b ) 

% 

% evaluates the scalar product (Y, Z ) by integration 
% 

% IN: 

% Y : a random variable — type anonymous function 
% U : a random variable — type anonymous function 
% f_omega : density of omega — type anonymous function 
% a, b : bounds of integration — type double 
% 

% OUT 

% v : value of the scalar product — type double 
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% 

fl = @(om) Y(om) *Z(om) *f_omega (om) ; 
v = numint(fl ,a,b); 

return ; 
end 


If a sample {(Yi, Zi) : 1 < i < ns} of the pair (Y, Z) is available, 
scalprod(Y, Z) must return the empirical mean (X^=i ^i^i) °f die product 
YZ. Then, we may use the code: 

Listing 3.3. Scalar products evaluated from a sample 
function x = expcoef(phi, N_X, Xs , Us) 

% 

% determines the coefficients of the expansion 
% using a sample ( U_i , X_i ) from (X,U) 

% 

% IN: 

% phi : basis function phi(k.U) — type anonymous function 
% N_X : order of the expansion — type integer 
% Xs : table of values of X — type array of double 
% Us : table of values of U — type array of double 
% ( Us ( i ) , Xs ( i ) ) is a variate from (U,X) 

% 

% OUT: 

% x: table 1 x N_X of coefficients — type array of double 
% 

[A, B] = v ari a t io n al_m a t ric e s ( phi , N_X, Xs , Us); 
x = A \ B; 

x = x’; % transpose to 1 x N_x vector 

return ; 
end 

function [A. B] = variational_matrices (phi , N_X, Xs , Us) 

% 

% creates the matrices of the linear system by using 
% scalar products evaluated from a sample 
% 

% IN: 

% phi : basis function phi(k.U) — type anonymous function 
% N_X : order of the expansion — type integer 
% Xs : table of values of X — type array of double 
% Us : table of values of U — type array of double 
% ( Us ( i ) , Xs ( i ) ) is a variate from (U,X) 

% 

% OUT: 

% A: table N_X x N_X of scalar products — type array of double 
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% A(i , j ) = ( phi(i ,U) , phi (j ,U) ) 

% B : table N_x x 1 of scalar products — type array 
% B(i) = ( X , phi ( i ,U) ) 

% 

A = zeros (N_X, N_X) ; 

B = zeros (N_X, 1) ; 
for i = 1: N_X 

fl = @(U) phi (i ,U) ; 

Y = map ( f 1 , Us , 1 ) ; 

A(i,i) = scalprod (Y,Y) ; 

Z = Xs; 

B(i) = scalprod (Y, Z) ; 
for j = i + 1 :N_X 

fl = @(U) phi ( j ,U) ; 

Z = map ( f 1 , Us , 1 ) ; 

aux = scalprod (Y,Z) ; 

A( i , j ) = aux ; 

A(j , i ) = aux ; 

end ; 

end ; 
return ; 
end 

function v = scalprod (Y,Z) 

% 

% evaluates the scalar product ( Y, Z) by using a 
% 

% IN: 

% Y : table of values of Y — type array of double 
% Z : table of values of Z — type array of double 
% (Y( i ) ,Z( i ) ) is a variate from (Y,Z) 

% 

% OUT 

% v : value of the scalar product — type double 
% 

v = mean(Y. *Z) ; 

return ; 

end 


of double 


sample 


Once the coefficients have been obtained, the values of the projection PX 
at the values of a sample Us of ns variates from U are provided by the code: 
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Listing 3.4. Scalar products evaluated from a sample 

function PX = proj ection (c ,Us , phi ) 

% 

% determines the projection PX at the points Us 
% assumes that Us contains ns points of dimension k 
% each line of Us is a point 

% assumes that c contains N_X vectors of dimension n 
% each line of c is a vector. 

% 

% IN: 

% c : table n x N_X of the coefficients of the expansion — 
type array of double 

% Us : table ns x k of the sample points — type array of double 
% phi : basis function phi(k.U) — type anonymous function 
% 

% OUT: 

% PX: table ns x n of the values of the projection — type array 
of double 

% 

ns = size (Us , 1 ) ; 
n = size (c , 1 ) ; 

N_X = size (c ,2) ; 

PX = zeros (n , ns ) ; 
for i = 1: N_X 

c_i = c ( : , i ) ; 
for j = 1 : ns 

PX(:.j) = PXC.j) + C_i*phi(i ,Us(j 

end ; 

end ; 

PX = PX’; % transpose PX to ns x n table 

return ; 
end 


Example 3.5.- Let us consider S7 = (0,1) and P((a,b )) = b — a (i.e. 
P ( duj ) = t (doj) where £ is the Lebesgue’s measure). Let U (cv) = s/uo and 
X (u) = uj 2 . In the last example, we have considered the approximation of 
X by a polynomial of degree d in U, which corresponds to ! .p % (U) = U ' l ~ 1 , 
0 < i < d + 1. Other families may be considered, such as, for instance: 


<Pi ( U ) = 1; f> 2 k ( U ) = sin (kU ) , 

f> 2 k + 1 (U) = cos ( kU ) , 1 < k < d (n = 2d + 1) . 
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In this case, we have: 



B 2k = — p- (A (120 — 20A 2 + A 4 ) cos (A) — 

5 (24 - 12 k 2 + A 4 ) sin(A)) ( A > 1) , 


B 2k +i — ( — 120 + 5 (24 — 12£r + A: 4 ) cos(A)+ 

k (120 — 20A 2 + A: 4 ) sin(A)) ( k > 1) , 


An = 1 , 

2 

^ 4 i, 2 fc = 717 (sin(A) - Acos(A)) (A > 1) , 
k z 


-4i,2fc+i = ^ ( cos (k) + ksin(k) - 1) (A > 1) . 


1 + 2A 2 — cos(2A) — 2A sin(2A) . . 

A 2 k,2k = 77o (A > 1) ■ 


^2fc,2fc+l — 


4At 2 

sin(2A) — 2Acos(2A) 

IP 


(A; > l) , 


. 2k 2 + cos(2A) + 2k sin(2A) — 1 

^2fc+l,2fc+l = —W 2 (* ^ X ) ’ 

and, for k / p, 

A 2 k, 2 p = ( fc2 _! p 2 f ( P (A 2 - P 2 ) cos(p) sin (A:) + A (p 2 - A: 2 ) cos(A) sin(p) 

+ ( k 2 + p 2 ) sin(A) sin(p) + 2Ap (cos(A) cos(p) — 1) , 

^ 2 fc, 2 p+i = {k 2 * p 2 f ( k ( P 2 “ A 2 ) cos(p) cos(Ai) + p (p 2 - A: 2 ) sin(A) sin(p) 
— 2Apcos(A) sin(p) + (A: 2 +p 2 ) cos(p) sin(A), 


^ 2 fc+i, 2 p+i = [k 2 l p 2 f ( k (A 2 - F 2 ) cos(p) sin(A) + p (p 2 - A: 2 ) cos(A) sin (p) 
+ (fc 2 +p 2 ) (cos(p) cos (A) — 1) + 2Apsin(A) sin(p). 
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Figure 3.2. Approximation using a trigonometrical basis. For a color version 
of the figure, see www.iste.co.uk/souzadecursi/quantification.zip 


The result obtained for d = 6 is shown in Figure 3.2. 

It is also interesting to compare the cumulative probability function and the 
density of probability (see Figure 3.3). ■ 




Figure 3.3. Approximation using a trigonometrical basis. For a color version 
of the figure, see www.iste.co.uk/souzadecursi/quantification.zip 


Example 3.6.- We stress that the existence of a dependence between the 
variables U and X is essential. Indeed, if the variables are independent, then: 

B t = (X, Vi ([/)) = E (X,^ (17)) = E(X)E (U)) , 

so that the solution of the linear system AX = B is: 

X = E(X)Xi 
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where: 

AX = B , Bi = E(ipi (U )) . 

Thus: 

PX = E(X)P( 1), 

where 1 is a constant function taking the value 1 everywhere. ■ 

3.1.6. Approximation using a total family 

A first generalization of the preceding situation is obtained by considering 
a subspace S possessing a total family F, i.e. a countable family F such that 
its finite linear span (the set of the finite linear combinations of elements from 
F) is dense on S see [DE 08]. We consider k = 1 and extension for At > 1 is 
obtained by applying the procedure to each component): 

F = K}„ 6f) ; [F] = j Z <E L 2 (Q,P) : Z = j S=[F]- 


Then: 

V £ > 0 : 3 U £ € [F] tal que \\X - U e \\ < £. 

3. 1.6.1. Case of an increasing family of subspaces 

The first popular situation corresponding to total families is the one where 
[F] is parameterized by one dimension, i.e. \F] is the reunion of increasing 
finite-dimensional subspaces: 

+oo 

[F] = LJ [F n ]\ dim([.F fc ]) = d k < oo; d k+1 > d k , V/c > 0; [F k ] c [Tfc+i]. 

n = 0 


In this case, we may define S k = \F k ] = \F k ] and we have: 


+oo 

S = [J S k ; dim (S k ) = d k < oo; d k+1 > d k , Mk > 0; S k C S k+1 . 

n = 0 
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For an arbitrary Z, we denote P k Z the projection of Z on ,S'/,. and c k (Z) = 
|| Z — PkZ\\. Then, we have: 

|| Z - PZ || < c fc+ i (Z) < c k (Z), V k £ N, 

so that the sequence {c k (Z)j k N is decreasing and lower bounded. Thus: 

c k (Z) — > c (Z) para k — > +oo. 

For Z £ S, we have: 

V e > 0 : 3 Z £ £ [F] such that \\Z — Z £ \\ < e. 

+00 

Since [F] = [J \F n ], we have Z £ € S k for some k £ N. Thus: 

n — 0 

Ck (Z) = \\Z-P k {Z)\\ < \\Z — Z £ \\ < e. 

As a result: 

c n (Z) < £ for any n > k 

and c (Z) < e. Since e is arbitrary, it follows that c (Z) = 0. So: 

V Z £ S : Pk (Z) — > Z when k — > +oo. 

Since X — PX is orthogonal to S and S k C S, we have X — PX _L S k for 
any k £ N, so that: 

V k £ N : P k (X - PX) = 0. 

Then, the lineality of the projection operator on finite-dimensional 
subspaces shows that P k X = P k (PX). Since PX £ S, we have: 

p k X = P k (PX) — > PX for k — > +oo. 

So: 

PX = lira P k X and \\X - PX || = lim \\X - P k X\\ . 
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Thus, P k X provides an approximation of PX. In practice, we do not 
generate subspaces such that S k C Sk+i, but: 


+OO 

V k : Si C |^J S n for i < k, 

n = k + 1 

a condition that is satisfied when V k : 3 n > 0 such that S k C S k+n . 

P k X may be determined in an analogous manner to the one used in the 
case of finite dimensional subspaces: if 

F k = {<p 1 (U),...,<p dk (U)} 

is a basis of Sk, then: 


^ k 

P k X = ^2 Xj,kVj (u) € S k 

3 = 1 

and Xfc = (xi k , ■■■■, x d k ,k) t is the solution of the linear system: 

AfcXfc = Bfc, = (ipj (U) , <p i (U)) , {B k \ = (X, Vi (U)) . 

3. 1.6.2. Case of an orthogonal family 

The second popular situation is the case where the family F is orthogonal , 
i.e.: 


(<Prn ( u ) . <P n ( u )) = o, if m ± n. 

In this case, we define: 

F k = {<p 0 (U),F 1 (U),..,^ n (U)} 

and [F] is the reunion of an increasing family of finite-dimensional subspaces: 


+oo 

[F] = LJ [F n \; dim([F fc ]) = k + 1 < oo; d k+1 > d k , Vk > 0; [F k ] C [F k+1 ]. 

n = 0 
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In addition: 

(A k ) ij = {<P j (U),<p i (C/))= 0, if i + j. 


so that: 


k 


I\ X = Y, XjVj (U) € S k , 

3 = 0 


where: 


Xi = Bi/Aa = (X, n ([/)) / (U) , ^ {U )) , 0 < i<k. 

In this situation, classical orthogonal families associated with square 
summable functions are often used. For instance, we may use a Fourier series 
associated with a trigonometrical basis or orthogonal polynomials. We 
underline that the orthogonality is connected to the distribution of the variable 
U : let fx be the density of probability of U and its range (i.e. the set of the 
values of U) be denoted by I). Then: 

Oft, Vj ) = E (<* (V) ( U))=j Vi (u) («) fu («) in. 

D 

so that fu appeal's as the weight function associated with the family. In general, 
fu = Aw(u), where .4 is a normalizing constant and w is the usual weight 
associated with the orthogonal family. For instance, 


family 

D 

w (■ u ) 

Tchebichev 1st kind (T n ) 

(-1,1) 

l/s/l-u 2 

Tchebichev 2nd kind (U n ) 

(-1,1) 

s/l — u 2 

Legendre ( P n ) 

(-1,1) 

1 

Laguerre (L n ) 

(0, +oo) 

e~ u 

Hermite probabilistic ( H n ) 

(— oo, +oo) 

e -«V2 

Trigonometrical (1, sin (ruru) , cos ( mru ) ) 

(-1,+1) 

1 


As a result, an a priori condition for orthogonality is the compatibility 
between the distribution of U and the weight. For instance, the use of an 
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orthogonal family H n supposes a Gaussian density of probability. In practice, 
we may overcome this limitation by using truncated series, which 
corresponds to an approximation using a finite-dimensional subspace: the 
family is used to generate a basis and the orthogonality is unnecessary. As in 
the previous situation, we have: 

PX = lira P k X and \\X - PX || = lim \\X - P k X || . 

3. 1.6.3. Case of a Hilbert basis 

A third situation of interest is the one where S possesses a Hilbert basis F, 
i.e., a family F that is both countable and orthonormal'. 

^ {V^nin £ N ’ (.Fm’ Fn) 

such that any element of S may be uniquely represented by a series (see 
[DE 08]): 

{ + oo 

sa 2 (Sl):s=^ 8^ (17) ; Si € R, V i € N 

i = 0 

In this case: 

+ OO 

PX=Y^ XjFj (U) € 5 

j = o 


J 1, if m = n, 
\ 0, if m f n. 


and 


+ oo \ 

x Wj ( u ) . Fi (U) 

J = ° ) 


(X,Vi(U)), ViG N. 


Since the family is orthonormal, we have: 

Xi = (X,tpi(U)), Vz € N. 

Let us observe that any countable family G = {f n } n g N such that 


S = l s € L 2 (Cl) : 8 = s iA (U);si€R, V i € N \ , 
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(i.e. such that any element of S may be uniquely represented by a series) may 
be transformed in a Hilbert basis (i.e. may be orthonormalized) by using 
Gram-Schmidt’s procedure: 

(t>o = V’o ; <A) = <Po/ UoW’ 

n— 1 

n > 0 : (j) n = ^ (^n» Vi) Vi 5 Vn = <l>nl \\<t>n\\ ■ 

i = 0 

In particular - , any of the orthogonal families previously presented may be 
transformed in an orthonormal one. When using a Hilbert basis, we also have: 

PX = lim P k X and \\X - PX || = lim \\X - P k X\\ . 

Example 3.7.- Let us consider H = (0,1) and P((a,b )) = b — a (i.e. 
P ( du ) = £ ( du ) where £ is the Lebesgue’s measure). Let U (oj) = \JZj and 
X (u) = oj 2 . Let us consider the family of the Legendre polynomials. We are 
looking for: 


k 

PkX = Xj(pj (U) (ipj = Legendre polynomial of order j). 

3 = 0 


Lor k = 6, we have: 





2 

1 

0 

1 

3 

4 

24 

1 

4 

1 

0 

2 

15 

12 

4 

1 

6 

13 

15 

4 

3„5 

192 

1 

6 

3 

8 

12 

35 

16 

63 

0 

13 

8 

9 

192 

63 

64 

1 

0 

3 

10 

96 

61 

89 

0 

1 

0 

31 

80 

768 
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and the solution is: 


/ 5 \ 
0 
4 

Xfc = 0 

_ 8 _ 

35 

0 

W 


which corresponds to: 


P k X = ^ (-1 + 3 U 2 ) + ^ (3 - 30 U 2 + 35 U 4 ) , 


i.e.: 


PkX = U\ 


what is the exact representation of X as a function of U. We may also consider 
Laguerre polynomials: in this case: 


k 

PkX = ^2 Xjipj (U) ((p i = Lagueme polynomial of order j). 
j = o 


For k = 6, we have: 


1 _J_ 19 151 1091 7841 \ 

3 12 60 360 2520 20160 ' 

1 J_ _J_ 191 341 20117 

6 20 36 2520 3360 181440 

J_ 2. 173 859 6067 14449 

20 60 1260 6720 60480 226800 

__L 173 1079 8359 26891 17929 

36 1260 5040 36288 129600 110880 

191 859 8359 47611 824141 907189 

2520 6720 36288 181440 3326400 4435200 

341 6067 26891 824141 4845121 574273 

"3360 60480 129600 3326400 19958400 2745600 

20117 14449 17929 907189 574273 97926401 , 

181440 226800 110880 4435200 2745600 518918400 / 
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B 


k = 


( \ \ 

J_ 

21 

19 

168 

281 

1512 

1507 

7560 

14591 

83160 

i 43427 , 
\ 332640 / 


and the solution is: 


X fc = 


/ 24 \ 

-96 

144 

-96 

24 

0 

V 0 / 


which corresponds to: 


P k X = 48 - 96(1 -U)- 96 U + 72 U 2 - 16U 3 + U A 

+72 (2 — 4f7 + U 2 ) - 16 (6 - 181/ + 9 U 2 - l/ 3 ) , 


PkX = U\ 


that corresponds to the exact representation. When using Hermite polynomials: 


k 

P k X = ^2 Xjipj (U) ( ipj = Hermite polynomial of order j) 
j = o 
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1 

4 

3 

0 

24 

5 

20 

3 

176 

7 

96 

4 

2 

8 

20 

464 

88 

10720 

3 

15 

3 

35 

3 

63 

0 

8 

4 

16 

32 

800 

496 

15 

3 

35 

3 

63 

5 

24 

20 

16 

24 

11552 

624 

40000 

5 

3 

35 

315 

5 

77 

20 

464 

32 

11552 

656 

53056 

22016 

3 

35 

3 

315 

5 

693 

15 

176 

88 

800 

624 

53056 

2528 

5181056 

7 

3 

63 

5 

693 

3 

3003 

96 

10720 

496 

40000 

22016 

5181056 

121920 

63 

5 

77 

15 

3003 

7 



so that the solution is: 



which corresponds to: 

P k X = \ \ (-2 + 4 U 2 ) + ^ (12 - 48 U 2 + 16C/ 4 ) , 

4 4 lb 

i.e.: 

PkX = u\ 



Representation of Random Variables 21 1 


and we obtain the exact representation. For Tchebichev polynomials of first 
kind, we have: 



and the solution is: 



which corresponds to: 

PkX = | + * (-1 + 2 U 2 ) + l (1 - 8U 2 + 8 C/ 4 ) , 

o Z o 


PkX = u\ 
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so that: 



P k X = i + A (_i + 4 U 2 ) + (1 - 12 U 2 + 161/ 4 ) , 


i.e. the exact representation: 
P k X = U\ m 
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Example 3.8.- The popular families of finite elements correspond to 
increasing finite-dimensional subspaces. For instance, let us consider k > 0, 
h = 1 / k, Xi = ih, 0 < * < k and a family of PI finite elements: 


Vi ( s ) 


1 — \ s ^ , if \s — Xi\ < h 
0, otherwise . 


Let us consider again Q = (0, 1), P ((a, b)) = b — a (i.e. P (du) = £ (doj), 
where l is the Lebesgue’s measure), U (at) = \JZj and X (to) = u 2 . We are 
looking for: 


P k X = Y; (U) . 
j- o 

For instance, when k = 10, we have: 


At = 


1 

600 

1 

600 

0 

0 

0 

0 

0 

0 

0 

0 

0 

1 

600 

l 

75 

1 

200 

0 

0 

0 

0 

0 

0 

0 

0 

0 

1 

200 

2 

75 

1 

120 

0 

0 

0 

0 

0 

0 

0 

0 

0 

1 

120 

1 

25 

7 

600 

0 

0 

0 

0 

0 

0 

0 

0 

0 

7 

600 

4 

75 

3 

200 

0 

0 

0 

0 

0 

0 

0 

0 

0 

3 

200 

l 

15 

11 

600 

0 

0 

0 

0 

0 

0 

0 

0 

0 

11 

600 

2 

25 

13 

600 

0 

0 

0 

0 

0 

0 

0 

0 

0 

13 

600 

7 

75 

1 

40 

0 

0 

0 

0 

0 

0 

0 

0 

0 

1 

40 

8 

75 

17 

600 

0 

0 

0 

0 

0 

0 

0 

0 

0 

17 

600 

3 

25 

19 

600 

0 

0 

0 

0 

0 

0 

0 

0 

0 

19 

600 

13 

200 
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B k 


( 21000000 \ 
3 

500000 

23 

250000 

289 

500000 
283 
125000 
667 
100000 
4069 
250000 
17381 
500000 
4203 
62500 
60267 
500000 
. 594323 , 

\ 7000000 / 


The results obtained with k = 50 are shown in Figure 3.4. 


Finite elements 



Figure 3.4. Finite element approximation. For a color version of the figure, see 
www.iste.co.uk/souzadecursi/quantification.zip 
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All the examples may be implemented in Matlab by using the programs 
given in section 3.1.5. 


3.2. Approximations based on statistical properties (moment matching 
method) 

The preceding approximations involve the evaluation of expressions that 
depend on the joint distribution of the pair (U. X). For instance, the evaluation 
of quantities such as / (x \ U = u), or: 

(U,X) = E(UX) = J uxf (u,x) dudx 

or, more generally, 

(■ X , tpi (U)) = E (Xifi (17)) = J yipi (■ u ) f (u, x) dudx , 

where / is the joint density probability of the pair ( If X). 

When the distribution of the pair is not known a priori, it has to be evaluated 
from samples. This may involve practical difficulties since, on the one hand, 
the generation of samples is mandatory - with a cost - and, on the other hand, 
the samples must be used for the estimation of these quantities - with an error. 
In practice, the usual situation involves quantities having unknown or difficult 
to determinate distributions and we have only a sample of the pair. 

This remark suggests that in some situations, it may be useful to use a 
procedure that does not involve the evaluation of quantities depending upon 
the joint distribution of the pair and estimates all the necessary statistical 
properties directly from a sample. For instance, we may determine an element 
PX € S such that some statistics of X coincide with those of PX. 

An example of such an approach is provided by the moment matching 
method, which looks for a finite expansion such that chosen moments of X 
and PX coincide: (A) = E (X r ) = E [(PX)^j = M, (PX) for some 

chosen values of i (for instance 1 < i < n). This kind of approximation is 
connected with Levy’s theorem (see section 1.12.3, [DE 92]) and the 
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convergence in law or convergence in distribution (see section 1.12.5): let us 
recall that the characteristic function of X is (see [1.10]): 

(pit) = E (exp ( itX )) 

and that a sequence {X n } n g f:J converges to X in law if and only if 
ip n ( t ) = E (exp ( itX n )) converges pointwise to ip ( t ) (i.e., ip n ( t ) — > <p (t) 
a.e.). The characteristic function is closely connected to the moments. As 
previously mentioned (see section 1.10 ): 

- if M p ( X ) < oo, then pi p ) (derivative of order p of (p) satisfies ip^ ( t ) = 
i p E (X p e ltx ) . In particular, (0) = i p M p (Y); 

- if M p {X) < oo, V p € N and the series S ( t ) = Yn e n M n ( X ) 
has a strictly positive convergence radius, then ip(t) = S (t). 

These two properties suggest that, on the one hand, it is possible to 
represent tp - and so, the law of A - by using the moments of X and, on the 
other hand, to approximate the law of X by using an expansion based on its 
first moments: a variable having the same moments as X is expected to have 
a distribution that closely approaches the distribution of X. The first idea 
(representation of the distribution of X by a series 5(f)) of moments is 
connected to the problem of the moments (see, for instance, [CHO 62] ) and 
the second idea has been exploited in the literature (see, for instance, 
[GAV 03, GAV 08, GAV 09]). 

From the numerical standpoint, we consider: 

k 

PX = XjPj (U) € S ; X = (xi, ...,x k y ; 

3 = 1 

and we determine X such that: 

M p (X) = M p (PX) for 1 < p < n. 

When k = n, these equations define a set of n nonlinear algebraical 
equations that may be solved by standard methods. However, in practice, we 
may consider a larger number of moments (k f n), and so, more equations 
than unknowns. In this case, it is convenient to look for solutions based on a 
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mi ni miz ation procedure. For instance, let us consider a map T : — > S 

given by: 


v = (m, ...,v k y 


k 

L 

j = i 


T(-y) = y t v jVj (U)€S 


and a map M : R k — y R n given by: 

V = (ni, v k y — y M (v) = (Ml (T (v)) , M n ( T (v)))* € M n . 

Taking: 


M x = (Mi (X) , M n ( X)) 1 g M n , 
we may introduce J : M 1 ' — y M given by: 

J (v) = di-st (M (v) , M_y) , 

where dist is a measure of distance: for instance, we may consider a norm 
||»|| Rn and M n and define dist (u, v) = ||u — v|| Rn . In this case, we look for: 

x = arg min J. 

R k 

The main difficulty remains in the non-convexity of J: the quality of the 
approximation depends on the quality of the optimization - if the numerically 
determined point is far from a global optimum of J, the approximation has a 
poor quality. As a result, it becomes mandatory to use adapted optimization 
procedures, able to solve non-convex continuous problems. 

In addition, we observe that Levy’s theorem ensures the convergence in 
distribution, which is a weak convergence involving approximation of the 
cumulative distribution function of the variable, but not the approximation of 
the variables themselves : the moment matching method is not expected to 
provide a good approximation of the variables, but only of their distributions. 
In practice, we may obtain a good representation of the variables if an 
excellent quality is obtained in the numerical solution of the nonlinear system 
of algebraical equations (or in the global optimization problem), while the 
approximation of the cumulative distribution function requests less 
computational effort. 
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Example 3.9.- Let us consider again Q = (0, 1), P ((a, b)) = b — a (i.e. 
P (du) = i ( duj ) where i is Lebesgue’s measure), U (w) = y/u and X (u) = 
u 2 . If we look for an approximation of .A by a polynomial function of degree 
5 of U, but the joint distribution of the pair is unknown, we must estimate 
the values of B, = E (XU 1 ) from the sample, what introduces an error. For 
instance, here arc the results obtained by different values of ns: 


ns X 

10 (-1.4309, 28.4332, - 169.5479, 421.2838, - 459.5838, 183.1038) 
100 (-0.1089, 2.1753, - 13.0411, 32.5765, - 34.8053, 14.3105) 

1000 (-0.0105, 0.2108, - 1.2644, 3.1608, - 2.4766, 1.3905) 



x 


Figure 3.5. Polynomial approximation with B estimated. For a color version of 
the figure, see www. iste. co. uk/souzadecursi/ 'quantification, zip 


The results arc presented in Figure 3.5: it shows that the approximation is 
poor until we use about 100 points. The situation may become more complex 
if the distribution of U is also unknown. In this case, the matrix A has to be 
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estimated from the given data: the results for such a situation arc shown in the 
table given below. 

ns X 

10 (0.5230, - 0.0748, - 14.9273, 46.0987, - 49.9426, 19.3635) 

100 (0.4373, - 0.0170, - 15.6027, 51.8225, - 59.8655, 24.2747) 

1,000 (0.4294, - 0.0135, - 15.4512, 51.4904, - 59.6683, 24.2615) 

As we can see, results arc poor even for 1, 000 data points: An alternative 

approach consists, as mentioned, of using an approximation for which the n 
first empirical moments of X coincide with those of the approximation. The 
results for n = 5, minimizing the sum of the squares of the relative errors 
(analogously to a y 2 measure of distance), arc shown in Figure 3.7. The final 
distance between the moments is IE — 13 and the quality of the approximation 
is good - even the density of probability is relatively correct. ■ 



Figure 3.6. Polynomial approximation with A and B estimated. For a color 
version of the figure, see www.iste.co.uk/souzadecursi/quantification.zip 
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Figure 3.7. Approximation by moment matching. For a color version of the 
figure, see www.iste.co.uk/souzadecursi/quantification.zip 




Figure 3.8. Approximation by moment matching. For a color version of the 
figure, see www.iste.co.uk/souzadecursi/quantification.zip 


All the examples may be implemented in Matlab by using the programs 
below. They assume that, on the one hand, f>k(U) is provided by a 
subprogram phi(k,U) while, on the other hand, a sample 
{(Ui, X{) : 1 < i < ns} of the pair (U,X) is available. In addition, a 
subprogram dist(Y,Z) provides the distance between data vectors Y and Z 
of same length. The program uses fminsearch in order to minimize the 
objective function. Of course, the reader may replace it by his own 
optimization program. Analogously, centered moments may be used instead 
simple ones by modifying the program empirical_moments. 
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Listing 3.5. Moment Matching 

function x = expcoef(phi, N_X, Xs , Us, n, dist , maxit,maxf, 
tol_x , tol_f ) 

% 

% determines the coefficients of the expansion 
% using a sample ( U_i , X_i ) from (X,U) and 
% moment matching 
% 

% IN: 

% phi : basis function phi(k,U) — type anonymous function 
% N_X : order of the expansion — type integer 
% Xs : table of values of X — type array of double 
% Us : table of values of U — type array of double 
% ( Us ( i ) , Xs ( i ) ) is a variate from (U,X) 

% n: number of moments to be considered 

% dist: function avaluating the distance between data vectors — 
type anonymous function 

% maxit : max of minimization iterations — type integer 
% maxf : max of objective function evaluations — type integer 
% tol_x .-precision requested on the coefficients — type double 
% tol_f : precision requested on the minimal value of the 
objective — type 
% double 
% 

% OUT: 

% x: table 1 x N_X of c o effici ent s — type array of double 
% 

M_X = empirical_moments (Xs , n) ; 

fobj =@(c) objective_m3 (c ,Us , phi ,M_X, dist ) ; 

xO = randn ( 1 , N_X) ; 

options = optimset (’ Maxlter maxit , ’ MaxFunEvals maxf TolX ’ , 
tol_x , ’TolFun’ , t o 1 _ f ) ; 
x = fminsearch ( fobj , xO, options); 

return ; 
end 

function v = objective_m3 (c ,Us , phi ,M_X, dist ) 

% 

% determines the objective function to be minimized 
% 

% IN: 

% c : table 1 x N_X of the coefficients of the expansion — 
type array of double 

% Us : table ns x k of the sample points — type array of double 
% ( Us ( i ) , Xs ( i ) ) is a variate from (U,X) 

% phi : basis function phi(j ,U) — type anonymous function 
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% M_X : table of moments to be approached — type array of 
double 

% dist : function evaluating the distance between data vectors — 
type anonymous function 

% 

% OUT: 

% A: table ns x N_X of values — type array of double 
% A(i , j ) = phi ( j , U_i) 

% 

PXs = projection(c,Us,phi); 

Ms = empirical_moments (PXs , length (M_X) ) ; 
v = dist (Ms, M_X) ; 

return ; 
end 


function m = 

% 

% determines 


empirical_moments(Y,n) 
the empirical moments 


% IN: 

% Y : vector of data — type array of double 
% n : number of moments — type integer 
% 

% OUT: 

% m = vector of moments — type array of double 


m = zeros (n , 1 ) ; 
for i = 1 : n 

m( i ) = mean( Y. A i ) ; 

end ; 
return ; 
end 


3.3. Interpolation-based approximations (collocation) 

An alternative use of a sample {(Ui, Xi) :i = 1, ns} consists of 
determining: 


k 

PX(U)='£x j v j (U)eS 

3 = 1 


such that: 

PX {Ui) = Xi,i = 1, ns. 
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In this case, X = (x\, ...,XkY is the solution of a linear system: 

AX = B, Aij = tp rj (Ui ) , Bj = Xi (1 < i < ns, 1 < j < k). 

This lineal - system involves ns equations for k unknowns. For ns > k, the 
number of equations is higher than the number of unknowns and the system is 
overdetermined: we look for generalized solutions, such as, for instance, 
minimum square ones - in this case, the solution may be interpreted as a 
discrete version of the Hilbertian approximations introduced in the case of 
finite-dimensional subspaces. 

When such an approach is adopted, other interpolation techniques may be 
used, such as, for instance, spline approximations or collocation by intervals - 
this last situation may be interpreted as a discrete version of the approximations 
based on increasing sequences of finite-dimensional subspaces. 

The wide variety of interpolation techniques avoids any tentative of a rapid 
summary, but the reader may refer to the literature in order to obtain more 
information about interpolation. Matlab implementation may be performed as 
follows: assume that, on the one hand, <Pk(U) is provided by a subprogram 
phi (k,U) while, on the other hand, a sample {(Ui, Xj) : 1 <i< ns} of the 
pair (U, X) is available. Then, we may use the code: 

Listing 3.6. Collocation 

function x = expcoef(phi, N_X, Xs , Us) 

% 

% determines the coefficients of the expansion 
% using a sample (U_i, X_i) from (X,U) 

% 

% IN: 

% phi : basis function phi(k,U) — type anonymous function 
% N_X : order of the expansion — type integer 
% Xs : table of values of X — type array of double 
% Us : table of values of U — type array of double 
% ( Us ( i ) , Xs ( i ) ) is a variate from (U,X) 

% 

% OUT: 

% x: table 1 x N_X of coefficients — type array of double 
% 

A = c o 1 1 oc at ion_matrix ( phi , N_X, Us); 
x = A \ Xs; 

x = x’; % transpose to 1 x N_x vector 

return ; 
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end 

function A = collocation_matrix (phi , N_X, Us) 

% 

% creates the matrix of the linear system by using a sample 
% 

% IN: 

% phi : basis function phi(k.U) — type anonymous function 
% N_X : order of the expansion — type integer 
% Us : table of values of U — type array of double 
% ( Us ( i ) , Xs ( i ) ) is a variate from (U,X) 

% 

% OUT: 

% A: table ns x N_X of values — type array of double 
% A( i , j ) = phi (j , U_i ) 

% 

A = zeros (ns, N_X) ; 
for i = 1 : ns 

for j = 1 :N_X 

A(i , j) = phi ( j , Us ( i ; 

end ; 

end ; 
return ; 
end 


Example 3.10.- Let us consider again = (0, 1), P ((a, b)) = b — a (i.e. 
P {dip) = i {dip) where l is Lebesgue’s measure), U (w) = \ftp and X (ip) = 
ip 2 . If we look for the approximation of X by a polynomial function of degree 
5 of U, a simple idea consists of generating a sample ([/, , X t ) for conveniently 
chosen values of U. Then, we may interpolate the values of X. For instance, 
let us consider a uniform grid on f): a;* = ih, 0 < i < np, h = 1 /rip and 
the corresponding values Ui = U (w 4 ) and X t = X (a;*). For a polynomial 
function of degree d = 5 of U, we have: 

X = (-0.0000, 0.0000, - 0.0000, 0.0000, 1.0000, 0.0000) 

The results are shown in Figures 3.9 and 3.10. ■ 
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Figure 3.9. Approximation by interpolation. For a color version of the figure, 
see www.iste.co.uk/souzadecursi/quantification.zip 



Figure 3.10. Approximation by interpolation. For a color version of the figure, 
see www. iste. co. uk/souzadecursi/quantification.zip 
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Linear Algebraic Equations 
Under Uncertainty 


In this chapter, we consider a given square n x n matrix, A = {A t j) G 
A4(n, n ) and classical associated problems such as: 

- lineal - equations: 

AX = B, 

where B = (Bi) G M(n, 1) is a known nx 1 vector and X = (X,;) G A4(n, 1) 
is an unknown n x 1 vector (situations where A is not square are also 
examined - see below); 

- eigenvalue or eigenvector problems: 

X / 0 and AX = AX , 

where A is an unknown real number and X is an unknown nx 1 vector. 

We will examine two basically different situations: 

- In the first situation, we consider that A or B may contain uncertainty 
generated by a random vector v - i.e. A = A(v), B = B(v), where v is a 
random vector - and we are interested in the determination of the distribution 
of probability or statistics of the solution X. 

- In the second situation, A and B are deterministic, but we are interested 
in stochastic methods for the determination of the unknowns. 
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Let us recall that there are classical methods for dealing with the first 
situation, based on the condition number of A, but these methods do not 
produce information about the distribution of probability of X. The condition 
number is defined as (see, for instance, [DAT 09]) 

corad (A ) = || A || I A _1 || 


and verifies (see, for instance, [DAT 09]) 

^ ^ < cond(A) - if A (AX) = AB 


|X| 


|B| 


This inequality gives an estimative of the error AX in X when an error AB 
arises in B. As observed, it does not carry an information about the distribution 
of X. Analogously, we have, for infinitesimal variations of A (see, for instance, 
[DAT 09] ): 


< cond (A) || cZA || . 


In the same way as previously, this inequality does not provide information 
about the distribution of A. 


4.1, Representation of the solution of uncertain linear systems 

In this section, we consider the linear system: 

AX = B, 

where A = ( Aij ) € A 4(n, n) is a square matrix ra x ra, B = (B*) G A 4(n, 1) 
is a vector of dimension ra x 1 and X = (A*) G A4(n, 1) is the vector of the 
unknowns. We assume that there exists a random vector v = (vi , v nr ) such 
that A = A(v), B = B(v). Our objective is the numerical determination of 
the probability distribution of X in three basic situations: 

- when the distribution of v is known and, in addition, the functions v — > 
A(v) and v — > B(v) are known; 

- when a sample of v is given, but the functions v — » A(v) and v — > 
B(v) are known; 

- when a sample of the pair ( A, B) is given. 
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In all these situations, this objective may be attained by the construction 
of an approximation PX of X in a convenient subspace of random variables. 
For instance, if the components Vi take their values on an interval f> = (a, b), 
then we may use the procedure introduced in Chapter 3: let us consider a total 
family F = {y3/.}/. C fi from the functional space L 2 (Q) and an approximation 
PX belonging to the finite-dimensional subspace 

S = [{<Fi (€),-,<Pn x (0}]" 

= jE D ^ (0 : G E ”’ 1 < k < j , Ml 

where N x G N* and £ is conveniently chosen. Let us consider 
V> (0 = (<Fi (0 , .... Fn x (£))* € M{N x , 1). Then: 

Y G S <=► Y = Dcp (0 , D = (D tj ) € M(n, N x ). 

Thus 

PX = x<p (0 , [4.2] 

where x = (Xij) G A4(n. N x ) are unknown coefficients to be determined. 
Once x is calculated, we may generate 


N x 

px = 2m* (0 

k = 1 


Nx 


i.e. (PX). = (v) 


[4.3] 


4.1.1. Case where the distributions are known 

In this section, we consider the situation where the distribution of v and the 
functions v — y A (v) and v — y B(v) are given. 

As previously observed (see Chapter 3), a convenient random variable is 
£ = v. Let us assume that v takes its values on O C M nr and X (v) G V = 
[L 2 (0)] " . In this case, let F = {<Ffc}fc C N a total family of the functional 
space L 2 (O). We have 


(Y, AX) = E (Y*AX) = E (Y f B) = (Y, B) , V Y G V 
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and X is the solution of the variational equation: 

X G U and E (Y*AX) = E (Y*B) , V Y <E V. [4.4] 

This equation is approximated as 

PX G S and E {Y t A (PX)) = E (Y f B) , V Y 6 5. [4.5] 

Since (from equation [4.2]), 

A (PX) = AC (f (v) , 
we have 

E (<p (v)* D*A XV (v)) = E (tp (v)* D*B) , V D G M(n,p). [4.6] 

Thus 

n Nx 

E (ip (v)* D*A*¥> (v)) = E (iPm ( V ) DimAijXjkVk ( v )) 

i,j= 1 fc,m=l 


and 


n A^x 

£ (*> (v)* D*B) (v) D im Bi) . 

2=1 m=l 

So, by taking Di, n = 8i r 6 ms in equation [4.6], we obtain 


n Nx 

xx^ {<P a ( v ) ArjXjkEk ( v )) 


3=1 k= 1 


^(^ s (v)S r ) , 

l<r<n,l<s< iV.Y • 


Using the notation 

Arsjk — E (Vs C'O ArjPk (x) ) > B rs 


E(<PsM B r) , 


[4.7] 
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we have 


n Nx 

EE a sjkXjk & rs , 1 ^ r ^ 77-, 1 ^ 5 ^ p. 

3=1 k= 1 

These equations form a linear system: let us consider 

ind(j, k ) = ( k — 1 )n + j 

and the matrices M = (M a p) € M(nNx,nNx), C = (Cg) € M.{nNx, 1), 
N = (N a ) G M(nNxi 1) by 

M a p = Arsjk , N a = B rs , Cj 3 = Xjk , « = *nd(r, s), /3 = ind(j, k ). 


Then, we have 

MC = N. [4.8] 

The solution of this linear system determines C and, as a result, x- This 
approach is implemented under Matlab by using the programs phi(k,U) and 
scalprod(Y, Z) previously introduced. Let us recall that the first program 
evaluates ip k (U), while the second program evaluates the scalar product of 
two variables Y and Z\ for instance, if Y = Y (uS), Z = Z(oj) and the density 
of u is defined on (a, 6), then scalprod(Y, Z) returns the value of 

J (t 6 Y(u})Z(u})f u (u})du. In this case 

Listing 4.1. UQ of a Linear System by integration 

function chi = expcoef (A,B, phi , n ,N_X,U, f_omega , a , b) 

% 

% determines the coefficients of the expansion . 

% by integrating the variables 
% 

% IN: 

% A : n x n matrix of the linear system — type anonymous 
fu. ncti o n 

% B : n x 1 second member of the linear system — type anonymous 
functio n 

% n : number of unknowns — type integer 
% N_X : order of the expansion — type integer 
% U : the random variable argument of the expansion — type 
anonymous function 
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% f_omega : density of omega — type anonymous function 
% a,b : bounds of integration — type double 

% 

% OUT: 

% chi : n x N_X matrix of the coefficients — type array of 
double 

% 

[M,N] = v aria t i o n al_m at r i c e s (A, B , phi , n ,N_X,U, f_omega , a , b ) ; 

C = M \ N; 

chi = zeros (n ,N_X) ; 

for r = 1 : n 

for s = 1: N_X 

alfa = index_map ( r , s , n ,N_X) ; 
chi (r , s ) = C( alfa ) ; 

end ; 

end ; 
return ; 
end 

function [M,N] = v ariat ional_matrice s (A,B, phi , n ,N_X,U, f_omega , a 

,b) 

% 

% determines matrices M, N such that 

% M( alpha , beta) = E( phi_s A_rj phi_k ) , N( alpha) = E(B_r 
ph i_s ) 

% alpha = index_map ( r , s ) , beta = index_map ( j , k ) 

% 

% IN: 

% A : n x n matrix of the linear system — type anonymous 
fu ncti o n 

% B : n x 1 second member of the linear system — type anonymous 
functio n 

% n : number of unknowns — type integer 
% N_X : order of the expansion — type integer 

% U : the random variable argument of the expansion — type 

anonymous function 

% f_omega : density of omega — type anonymous function 
% a,b : bounds of integration — type double 
% 

% OUT: 

%M : nN_X x nN_X matrix of the linear system 

% B : nN_X x 1 second member of the linear system — type 

anonymous function 

% 

M= zeros (n*N_X, n*N_X) ; 

N = zeros (n*N_X, 1) ; 
for r = 1 : n 
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B_r = @(om) select_index([r] . B ,U(om) ) ; 
for s = 1: N_X 

phi_s = @(om) phi ( s ,U(om) ) ; 
alfa = index_map ( r , s , n ,N_X) ; 

N(alfa) = s calprod ( B_r , phi_s , f_omega , a , b ) ; 
for j = 1: n 

A_rj = @(om) select_index ([r ,j ] ,A,U(om) ) ; 
for k = 1: N_X 

phi_k = @(om) phi (k ,U(om) ) * phi_s (om) ; 
betta = index_map ( j , k , n ,N_X) ; 

M( alfa , betta ) = scalprod ( A_rj , phi_k); 

end ; 

end ; 

end ; 

end ; 
return ; 
end 

function val = select_index ( index , M, U) 

% 

% selects the element M_index (U) from matrix M(U) 

% 

% IN: 

% index — the vector of index — type array of integer 
% M — the matrix as function of U — type anonymous function 
% U — the random variable — type double 
% 

% OUT: 

% val — the value of M_index (U) — type double 
% 

mmm = M(U) ; 
if length ( index ) == 1 

val = mmm( index ( 1 ) ) ; 
elseif length ( index ) == 2 

val = mmm( index ( 1 ) , index (2)); 

end ; 
return ; 
end 


Example 4.1.- Let us consider the case where v = (vi) is uniformly 
distributed on (—1,1): its density is 


f(v) = 


1/2, if -level 
0, otherwise. 
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Let us consider 




/ 2vi + 3 v\ + 3 \ 

\ vi + 1 vi + 2 J 



In this case, 

det(A) = v\ + 3ni + 3 > 1 


and 


X = 


v\ + v\ — 3 
v\ + 3ni + 3 V —v\ + vi + 3 


The method exposed is applied with a polynomial basis: 

fc-i 


<Pk(v) = 


V + 1 


The results obtained with p = 5 are exposed in Figure 4.1. The relative 
mean quadratic error is of 0.5%. 




Figure 4.1. Results obtained in example 4.1. For a color version of the figure, 
see www.iste.co.uk/souzadecursi/quantification.zip 


Example 4.2.- Let us consider a bar, i.e. a structure modeled as a 
unidimensional continuous medium having the geometry of a segment of line 
and having only longitudinal ones as admissible displacement. A bar of 
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length £ > 0 is described by a variable s € (0,£) and its field of 

displacements is a function x : (0, £) — > M. The mechanical behavior of the 
bar is characterized by its elasticity modulus E , density p and right section S. 
The equilibrium of a bar under the loads of the gravity g and a force F 
applied to one of its extremities is described by the following equation: 

4 - ( ES 4 ^] +pSg = 0 on ( 0 , £) , ES ^ (£) =F,x ( 0 ) = 0 . 

as \ as ) ds 

In real structures, all these parameters are affected by variations: E, S, p, F 
may be considered as random variables taking their values on given intervals. 
In addition, internal and/or external variability may occur: different values may 
be observed at the different points of the same bar - and/or different values may 
be obtained for two structures expected to be identical. For instance, wood 
structures present internal variability when considering layers corresponding 
to different year's and external variability when considering different trees. 

When the equilibrium equations of the bar - are solved by a finite element 
method, the interval (0,£) is discretized in n subintervals of length h = t/n 
corresponding to the nodes Sj = ih, 0 < i < n. The unknowns are the 
approximated values X t ~ x (si), for 1 < i < n, which verify the linear 
system AX = B, where the stiffness matrix A is obtained by assembling the 
elementary stiffness matrices issued from each element /,, while B is 
constructed by assembling elementary mass and force matrices from each 
element /,. 

In order to study the effects of an external variability, we may consider F as 
deterministic and E. S, p as a constant for each structure, but varying among 
the structures. In this case, 


/ 2 - 1 0 °\ / 2 \ / °\ 



-12-10 

0 -1 0 0 

B _ pSgh 

2 


0 

... 0 ... 

0 2-1 

2 

2 


0 


V o 

0-11/ 


W 


\EJ 


where E. S and p are random variables. Taking v = (vi,vi), nr = ES, 
V ‘2 = pS, this situation corresponds to the case under consideration in this 
section. For instance, let us assume that each v t is an independent random 
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variable uniformly distributed on an interval fa,;, b,). Let us consider two 
strictly positive integers m > 0, 712 > 0 and the functions given by 

(0 < r < n\, 0 < s < 712 ) 


%(v) = I 4 - 9 ! 

The procedure introduced in this section may be applied. For instance, let us 
consider the situation where a\ = 2.2 MN,b\ = 2.7 MN, 0,2 = 0.11 kN/m , 
62 = 0.14 kN/m, F = 1 kN and l = 5m. Using m = ri 2 = 3, n = 10, we 
obtain the results shown in Figure 4.2. The mean quadratic error is less than 
0 . 02 % . 



Figure 4.2. Results obtained in example 4.2 (independent variables). For a 
color version of the figure, see www.iste.co.uk/souzadecursi/quantification.zip 


In practice, the variables E and p are not independent. In real situations, 
analysis of samples of wood (often of a given type from the same forest) shows 
a significant correlation between these two parameters and we may consider 
approaching E by a function of p. When a sample from the pair (E, p) is 
available, such data may be used in order to generate a function E = E (p) 
(see Chapter 3). For instance, let us assume that E ~ (1, 840 + 15p) x 10 6 in 
SI units. In this case, the single random variable to be considered is v = (t>i), 
v\ = pS. Let us assume that the variable pS is uniformly distributed on (a, b), 
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a = 0.11 kN/m, b = 0.14 kN/m, F = 1 kN, t = 5 m. We may apply the 
approach presented in this section with a polynomial basis: 


<Pk(v) 


v + a 
b — a 


k - 1 


The results obtained using n = 10, p = 5 may be found in Figure 4.3. The 
mean quadratic error is lower than IE — 5%. 


displacement (in pm) of the last point of the bar, p = 5 , n = 1 0 



p (in kN/m) 

Figure 4.3. Results obtained in example 4.2 (E as a function of p). For a color 
version of the figure, see www.iste.co.uk/souzadecursi/quantification.zip 


Internal variability may be studied in an analogous manner. 


4.1.2. Case where the distributions are unknown 

When considering variables having unknown densities, the means 

appearing in equations defining the matrices M and N of the linear system 
which determines the coefficients x of the projection PX cannot be evaluated 
directly, but they have to be estimated from a sample', the means involved in 
equation [4.7] are approximated by using empirical means obtained from the 
sample in order to generate the linear system [4.8]. For instance, these 
unknown values may be estimated by using a sample 
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((v\ A 1 , B 1 ) , (v ns , A ns , B ns )) of ns variates from the triplet 
(v, A, B): 

1 ns 1 ns 

A rsjk ( V *) Kj’Pk ( V 0 • B ra « ~ X) ( V 0 B r~ 

1=1 i=l 

If the functions v — > A(v) and v — y B(v) are explicitly known, all 
the quantities may be approximated by using the sample (v 1 , v ns ) of ns 
variates from v: 

^ ns i ns 

Arsjk « ~ (V*) A rj (v 4 ) <f k (v 4 ) , B rs (v*) B r (v 4 ) . 

Once these values have been evaluated, the procedure is the same as in the 
preceding section: the solution of the linear system [4.8] determines C and, as 
a result, X- This approach is implemented under Matlab by using the program 
scalprod(Y, Z) evaluating the scalar product by using a sample. In this case 

Listing 4.2. UQ of a Linear System by using a sample 
function chi = expcoef (A,B, phi , n ,N_X, Us ) 

% 

% determines the coefficients of the expansion . 

% by using a sample 
% 

% IN: 

% A : n x n matrix of the linear system — type anonymous 
fu n c t i o n 

% B : n x 1 second member of the linear system — type anonymous 
functio n 

% n : number of unknowns — type integer 
% N_X : order of the expansion — type integer 
% Us : table of values of U — type array of double 
% Us(: , i ) is a variate from U 
% 

% OUT: 

% chi : n x N_X matrix of the coefficients — type array of 
double 

% 

[M,N] = variational_matrices (A,B, phi ,n ,N_X,Us) ; 

C = M \ N; 

chi = zeros (n ,N_X) ; 

for r = 1 : n 



2 £ 
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for s = 1: N_X 

alfa = index_map ( r , s , n ,N_X) ; 
chi (r,s) = C( alfa ) ; 

end ; 

end ; 
return ; 
end 

function [M,N] = variational_matrices (A,B, phi , N_X, Us) 

% 

% determine s matrices M, N such that 

% M( alpha , beta) = E( phi_s A_rj phi_k) , N( alpha ) = E(B_r 
ph i_s ) 

% alpha = index_map ( r , s ) , beta = index_map ( j , k ) 

% 

% IN: 

% A : n x n matrix of the linear system — type anonymous 
fund i o n 

% B : n x 1 second member of the linear system — type anonymous 
functio n 

% n : number of unknowns — type integer 
% N_X : order of the expansion — type integer 
% Us : table of values of U — type array of double 
% Us (:, i ) is a variate from U 
% 

% OUT: 

%M : nN_X x nN_X matrix of the linear system 
% B : nN_X x 1 second member of the linear system — type 
anonymous function 

% 

= zeros ( n*N_X, n*N_X) ; 

= zeros ( n*N_X, 1 ) ; 
for r = 1 : n 

B_r = @(U) select_index ( [ r ] ,B,U) ; 
for s = 1: N_X 

phi_s = @(U) phi ( s ,U) ; 
alfa = index_map ( r , s , n ,N_X) ; 
fl = @(U) phi_s (U) *B_r(U) ; 

Y = map ( f 1 , Us , 1 ) ; 

N(alfa) = mean(Y) ; 
for j = 1: n 

A_rj = @(om) select_index ( [ r , j ] ,A,U(om) ) ; 
for k = 1: N_X 

phi_sk = @(U) phi(k,U)*phi_s(U) ; 
fl = @(U) phi_sk(U)*A_rj (U) ; 

Y = map ( f 1 , Us , 1 ) ; 

betta = index_map ( j , k , n ,N_X) ; 
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M( alfa , betta ) = mean(Y) ; 

end ; 

end ; 

end ; 

end ; 
return ; 
end 


Example 4.3.- Let us consider again the situation studied in example 4.1: 
here, we assume that the distribution of v is unknown, but that a sample of 
ns variates from v has been furnished - in this example, such a sample is 
generated by using the intrinsic function rand of Matlab: the instruction v = 
a + (b-a) {*}rand(ns , 1) ; generates a vector of ns real numbers from the 
uniform distribution on (a, b). The results obtained with p = 5, ns = 25 are 
shown in Figure 4.4. The relative mean quadratic error between data and the 
approximation is less than 0.5% and a comparison with the exact values leads 
to an analogous result. 




Figure 4.4. Results obtained in example 4.3 ( sample of 25 
random points). For a color version of the figure, see 
www. iste. co. uk/souzadecursi/ quantification, zip 


Figure 4.5 shows the results obtained with p = 5, ns = 100. The relative 
mean quadratic error between data and the approximation is less than 0.5% 
and a comparison with the exact values leads to an analogous result. 
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Figure 4.5. Results obtained in example 4.3 (sample of 100 
random points). For a color version of the figure, see 
www. iste. co. uk/souzadecursi/ 'quantification, zip 


As previously observed, we may also consider the use of a uniform grid, 
analogously to the use of a grid of uniformly distributed collocation points: 
for instance, when using ns = 11 equally spaced “<collocation”> points, we 
obtain the results shown in Figure 4.6. The relative mean quadratic error is 
less than 0.6% when the approximation is compared with the data or the exact 
solution. 




Figure 4.6. Results obtained in example 4.3 (11 equidistributed points). For a 
color version of the figure, see www.iste.co.uk/souzadecursi/cjuantification.zip 

Example 4.4.- Let us consider the situation presented in example 4.2: 
analogously to example 4.3, we assume that the distribution of v is unknown, 
but a sample formed by ns variates from the variable v is available - in this 
example, such a sample is generated in the same way as in the preceding 
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example. The results obtained in the case where a\ = 2.2 MN, 

b\ = 2.7 MN, ci 2 = 0.11 kN/m, 62 = 0.14 kN/m, F = 1 kN, 1 = 5m, 
n\ = ri 2 = 3, n = 10, ns = 100 are shown in Figure 4.7. The relative mean 
quadratic error is less than 0.02%. 



Figure 4.7. Results obtained in example 4.4 (sample of 100 
random points). For a color version of the figure, 
see www.iste.co.uk/soiizadecursi/quantification.zip 


When using a uniform grid of 36 points, the relative mean quadratic error 
remains lower than 0.02%. The results are shown in Figure 4.8. 



Figure 4.8. Results obtained in example 4.4 (uniform grid of 36 points). For a 
color version of the figure, see www.iste.co.uk/souzadecursi/quantification.zip 
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The situation is analogous when E is considered as a function of p. With 
the same parameters given in example 4.2, a uniform grid of 11 equally spaced 
values of p furnishes the results in Figure 4.9, with a relative mean quadratic 
error of IE — 5% for the comparison between the approximation and the exact 
solution. 


displacement (in pm) of the last point of the bar. p = 5 , ns = 1 0 



p (in kN/m) 


Figure 4.9. Results obtained in example 4.4 (11 equidistributed points). For a 
color version of the figure, see www.iste.co.uk/souzadecursi/quantification.zip 


As in the preceding situation, internal variability may be studied in an 
analogous manner. 

4.2. Representation of eigenvalues and eigenvectors of uncertain matrices 

In this section, we consider the determination of the pairs (A*, X, j, where 
A i is an eigenvalue and X, is an eigenvector of the square matrix A = (A, ; ) £ 
A4(n, n): 

AX; = A; X ; , X; / 0 . 

In order to simplify the notation, we do not write the index i and we use the 
notation X instead of writing X t , A in place of X t : the readers must keep in 
mind that the method constructed in the following considers eigenvalues and 
eigenvectors. In addition, we denote ||y|| = \J y l .y for a vector y 6 M n . 
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We assume that the matrix A depends on a random vector v = (v\ , .... v nr ) 
(i.e. A = A(v), where v is random). 

Eigenvalues and eigenvectors of random matrices have been studied in 
the literature by using various procedures and approaches. Examples 
include analysis of the eigenvalues and eigenvectors of random matrices 
and their distributions in various situations (see, for instance, 

[FOR 09, CHO 09, HAN 78]), construction of estimators and determination 
of statistics (see, for instance, [BHA 10, TAO 09, TAO 10]), polynomial 
decomposition for the determination of the coefficients of polynomial chaos 
expansions associated with random variables (see, for instance, 

[RAH 06, RAH 07, RAH 09, RAH 1 1]) and analysis of continuous dynamical 
systems (see, for instance, [MAN 93b, MAN 93a, CRO 10]). In the following, 
we consider the classical methods of representation, based on the approaches 
by collocation, moment fitting or projection; we also consider the adaptation 
of classical numerical methods for the determination of eigenvalues and 
eigenvectors, such as iterated powers, subspace iteration and Krylov 
iterations. In a coherent approach with the presentation given in Chapter 3, 
the eigenvalues and eigenvectors studied are considered as unknown functions 
of the random variables involved - we have A = A(v) e X = X(v) - and a 
formal representation in terms of a series of these random variables is 
introduced in order to generate an approximation of these functions and 
reduce the problem to the determination of the coefficients of the terms of the 
series. Analogously to the preceding sections, we also consider the situation 
where only one sample is available. 

Let us assume, without loss of generality, that the eigenvalues and 
eigenvectors under consideration arc all real, i.e. their imaginary parts are null 
(the method presented in the following extends straightly to complex 
eigenvalues and eigenvectors, with a similar implementation given by the 
replacement of M by C). As previously mentioned, we may consider an 
approximation PX corresponding to the truncation of an infinite series giving 
the representation of the solution on a Hilbert basis (see equations 
[4. 1]— [4.3]): 

PX€Sx=[{^ (*)}]" 

N x 

(O : D k € M n , 1 < k < Nx 

k = 1 
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In an analogous manner, we consider P A such that 
P\eS x = 



N x N x 

PX = X] XkPk (0 , 4Y> fe (0 [4.10] 

fc=l fc=l 


Taking tp (() = fa <p Nx (())* € M(N X ,1) and 

^ (0 = fa ( 4 ) , ^jv a ( 0 )* € M(N\, 1), we have: 

PX = *¥> (0 and P\ = ^{$) , [4.11] 

where x = (x*j) F M(n, Nx) and £ = (if) G A4(l, iVx) are the unknowns 
to be determined. 

In practice, the same family and the same degree of approximation may be 
used for both the elements X and A. 


4 . 2 . 1 . Determination of the distribution of eigenvalues and eigenvectors by 
collocation 

The simplest way to determine an approximation of the distribution of an 
eigenvalue or eigenvector consists of the use of a sample in order to generate 
a projection. For instance, let us assume that we arc interested in the dominant 
eigenvalue: if we may obtain a sample S = (£ l5 ...,£ ns ) of ns variates of £ 
(or if such a sample may be generated), we may use it for the generation of 
a sample of the dominant eigenvalue A and, then, for the determination of the 
values of i corresponding to the best approximation having the form given in 
equation [4.11], with respect to the data furnished by the sample (Ai, ..., A ns ). 
For instance, we may determine the coefficients of the expansion by solving 
the linear - system 

A* = PA(&) = i%/> (&) (i = l,...,ns). 
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This lineal - system is expressed as 
U. = X, I.ij ,.•,(£,) 

and it is, in general, overdetermined (more equations than unknowns). We 
may look for a solution furnished by minimum squares, with a Tikhonov 
regularization parameter e > 0 - in this case, we have 

(L*L + eld ) t = L* A . 

This approach may be implemented in Matlab by using the programs given 
in section 3.3. 

Example 4.5.- Let us consider a single random variable v = u 2 € R, u 
uniformly distributed on (0,2) and the random matrix 

A(„) = f cos (f v ) - sin (!^) W(2-v) °U cos(fn) sin (fit) \ 

' ' Ysin(|n) cos (|n) )\ 0 v J \ — sin (|n) cos (|n) J ’ 

[4.12] 


The eigenvalues of A are 2 — v and v. Thus 


A 


2 — v, if 0 < v < 1; _ ( v, if 0 < v < 1; 

v, if 1 < v < 4. ’ — \ 2 — v, if 1 < v < 2. 


In this case, the exact cumulative distribution function of the dominating 
eigenvalue is 


F -(X) = { 5 (v/A - x/2^A) , se 1 < A < 2; 
A \ se 2 < v < 4. 


[4.14] 


Let (vi, ...,v ns ) be a sample of ns variates from v, which leads to the 
dominating eigenvalues (Ai, ..., \ ns )i- These values may be used in order to 
generate and solve the linear system 


A i = PX(vi) = ill) (vi) . 


As previously stated, this linear system is, in general, overdetermined and 
has more equations than unknowns. The results obtained by using a sample of 
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ns = 21 realizations are given in Figures 4.10 (results corresponding to a 
Tikhonov regularizing parameter equal to zero) and 4.11 (results 
corresponding to a Tikhonov regularizing parameter of IE — 3). The figure 
on the left compares the variable A to the projection P A, while the figure on 
the right compares the cumulative distribution function of the approximation 
to the exact one. We use an approximation by a polynomial of degree 8. 



Figure 4.10. Results obtained in example 4.5 (random sample of 
21 points). For a color version of the figure, 
see www.iste.co.uk/souzadecursi/quantification.zip 




Figure 4.11. Results obtained in example 4.5 (sample of 
21 random points). For a color version of the figure, 
see www.iste.co.uk/souzadecursi/quantification.zip 


The results given by a sample v l = uf generated from the equally spaced 
points (tii , • ••, u ns ) are shown in Figure 4.12. 
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Figure 4.12. Results obtained in example 4.5 (21 equidistributed points ). For a 
color version of the figure, see www.iste.co.uk/souzadecursi/quantification.zip 


If we arc interested in the eigenvalues, we may consider a sample of pairs 
(A,;, X;) such that ||X;|| = 1 and max{Xi.ei,Xi.e} > 0. Then, we may use 
an analogous procedure and approximate the eigenvectors by using the 
representation given in equation [4.3]. An example of results is given in 
Figure 4.13 - they arc obtained by using ns = 51 equally spaced values of u 
and a trigonometrical basis of Nx = 23 elements. 




Figure 4.13. Results obtained in example 4.5 (51 equidistributed points). For a 
color version of the figure, see www.iste.co.uk/souzadecursi/quantification.zip 
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4.2.2. Determination of the distribution of eigenvalues and eigenvectors by 
moment fitting 

A variation of the method presented in the previous section consists of 
using a sample A = (Ai, X ns ) in order to generate a representation by the 
moment matching method', a sample provides the empirical moments 
M e = (Mf, M®), Mf = A (AJ) and the approximated values 
M a (£) = (Mf, ...,M“), Mf = A J2j = 1 (^( v j*))> generated by using the 
representation i. These values may be used in order to determine the 
coefficients of the expansion that fit values M a (£) to M e : we may solve the 
nonlinear algebraical system M a (T) = M e or, as an alternative, minimize an 
objective function d (M a (T), M e ), which measures a pseudo-distance 
between them. This approach may be implemented in Matlab by using the 
programs given in section 3.2. 

Example 4.6.- Let us consider again the situation described in example 4.5. 
We compare the four different methods for moment fitting: three methods arc 
based on the alternative minimization of a pseudo-distance (minimization of 
the relative error, minimization of the sum of relative errors and minimization 
of the absolute error) and the fourth method is the numerical solution of the 
nonlinear algebraical system introduced above (see also section 3.2). The 
results corresponding to a sample of ns = 21 points arc shown in Figure 4.14: 
we observe that the eigenvalue itself is not correctly approximated, but the 
results arc good for its cumulative density function - as observed in section 

3.2. The results given by a sample of ns = 21 equally spaced values of u are 
shown in Figure 4.15. Finally, the results obtained by using 51 equidistributed 
values of u arc shown in Figure 4.16. In all these situations, better results arc 
obtained when solving the nonlinear algebraical system of equations. 


4.2.3. Representation of extreme eigenvalues by optimization techniques 

Let F : M n — > M be given by 

F(X) = X f .A.X j XfX . [4.15] 

F is Rayleigh’s quotient associated with the matrix A. Its minimum value 
coincides with the minimal eigenvalue A of A, and it is attained when X 
coincides with an eigenvector associated with A (see, for instance, [GRI 02]). 
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In an analogous manner, its maximum value coincides with the dominating 
eigenvalue A of the matrix A and it is attained when X is one of the 
associated eigenvectors. 






Figure 4.14. Results obtained in example 4.6 (21 random points). For a color 
version of the figure, see www.iste.co.uk/souzadecursi/quantification.zip 




Figure 4.15. Results obtained in example 4.6 (21 equidistributed points ). For a 
color version of the figure, see www.iste.co.uk/souzadecursi/quantification.zip 


Rayleigh’s coefficient provides an alternative approach for the numerical 
determination of the distribution of extreme eigenvalues by minimizing or 
maximizing the mean E (F(PX)). For instance, we may use the 
representation given in equations [4. 10]— [4. 1 1] in order to introduce 


f(x) = E(F( X <Pm 


[4.16] 
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Figure 4.16. Results obtained in example 4.6 (51 equidistributed points ). For a 
color version of the figure, see www.iste.co.uk/souzadecursi/quantification.zip 


Then, we may numerically determine x suc h that / is minimal or maximal: 
the solution defines PX - the numerical optimization may be carried out by 
using, for instance, the methods presented in [LOP 11]. In addition, on the one 
hand, the minimum may be restrained to a ball xj-Xi < r, (r > 0) and, on 
the other hand, singular value decompositions (SVD) may be used: if A(v) = 
W(v)S(v)U(v) < , with W(v) # W(v) = U(v) < U(v) = Id, then, for B(v) = 
U(v)*W(v)S(v) and Y = U(v)*X, 

F(X) = G(Y) = Y f .B.Y j Y f .Y [4.17] 

and B may be used instead of A. The algorithm corresponding to this method 
is algorithm 4.1. 

Algorithm 4.1. Rayleigh’s quotient optimization 

Require: Nx > 0, kmax > 0, preemin > 0, G A4(n, Nx)\ 

Require: a sample v s of ns variates from v; 

Require: a method for the optimization of /; 
generate : x minimizing / (equation [4.16]); 
generate : PX associated to x\ 

A := PX'.A.PX / PX f .PX; 
return PX, A 
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An alternative use of Rayleigh’s quotient consists of its use for the 
generation of a sample of A or A (by minimization or maximization): then, 
the sample may be used for collocation, as in section 3.3. For instance: 

- either minimize /* (x) = F (x'P ( v i))) for each i: each minimization 
produces a variate A* of A; 

- or minimize Fj (X) = X f .A (vi).X J X f .X for each i : in this case too, 
each minimization generates a variate from A* of A. 

Example 4.7.- Let us consider again the situation described in example 4.5. 
Here, we are interested in determining the extreme (maximal and minimal) 
eigenvalues, which correspond to the maximization and minimization of the 
Rayleigh’s quotient (equation [4.17]. We use the approach introduced in 
[LOP 11], with a sample of ns = 21 equally spaced values of u. The results 
obtained are shown in Figures 4.17 and 4.18, respectively. 




Figure 4.17. Minimal eigenvalue in example 4.7 (21 equidistant points). For a 
color version of the figure, see www.iste.co.uk/souzadecursi/ciuantification.zip 


4 . 2 . 4 . Power iterations 

A second classical method for the determination of the dominating 
eigenvalue when A possesses n different eigenvalues is the method of power 
iterations, which consists of iterations stalling from an initial vector X H|) : 

j | | 1 1 , rjl(fe+l) _ A.X^. 


X( fc+1 ) = 


[4.18] 
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Figure 4.18. Maximal eigenvalue in example 4.7 (21 equidistant points). For a 
color version of the figure, see www.iste.co.uk/souzadecursi/quantification.zip 


Let us consider the representation introduced in equations [4. 10]— [4. 11]: 
we define X^ fc+1 )(n) = %^ fc+1 ^(v) and, in an analogous manner, T^ +1 ) = 
T( fc+1 V(v). Then, we have: 

Nx Nx n 

Y r S +1 W v ) = E E^( v )x^Vi(U). [4.19] 

j = 1 ji=l 9=1 

Multiplying both sides by we obtain: 

Nx Nx Tl 

Y T% +1) <p j (y)<p i (v) = Y [ 4 -20] 

j = 1 3= 1 9=1 

and, by taking the mean of both sides, it results that 

Nx n Nx n 

Y J2 c P i i3 T S +1) = 5Z Y v P i( n N)xS , [ 4 . 2 1] 

j = 1 9=1 3 = 1 9=1 

where 

^ (<Wi( v )¥>j(v)) , = -E 1 (4pg(v)^(v)<y9 i (v)) . [4.22] 
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Let us introduce the transformation ind(a, b) = a + (b— 1 )Nx'- we denote, 
for r = ind(p , i) and s = md(q,j), 

C —C D — V A k + 1 ) — r ( fc+1 ) x ( k ) — y ( fc ) m 931 

and equation [4.22] expressed as 

Ct (fc+1) = Dx (fc) . [4.24] 

Once f( fc+1 ) is evaluated, it may be normalized in order to generate 

cc (fe+ i) = i (fc+1) j ||t (fc+1) || , Xq k j +1) = 4 fc+1) - [4.25] 

Equations [4.24]-[4.25] define an iterative procedure: X 1 '^- 1 is given, 
which determines (x!' k} ) and, as a result, and t^ +1 ) are given by 

equation [4.24]). Thus, x( fc+1 ) and Xnx are gi yen by 

equation [4.25], which furnishes At the end of the iterations, the 

eigenvalue is approximated by the Rayleigh’s quotient defined in equation 
[4.15]: A ~ F(X). Here yet, SVD may be used: A(v) may be replaced by 
B(v). 

A deflating approach may also be used in order to generate other 
eigenvalues: once X and A arc determined, we may consider 
A = A — AX.X < j X f .X. Then, the dominating eigenvalue of A is the 
second larger modulus eigenvalue of A, and the procedure presented may be 
applied to A in order to generate a representation for the second eigenvalue. 

At least, we observe that the determination of the minimal eigenvalue may 
be carried out by using the inverse A -1 of A instead of this last one. In 
practice, the determination of the inverse matrix may be avoided by using 
inverse iterations: A.T^ fc+1 ^ = ~X.( k \ and we have Dt^’ +1 ) = Cx( fc ) instead 
of equation [4.24]. 

The algorithm corresponding to this method is shown in algorithm 4.2. 


As in the preceding situation, an alternative consists of using power 
iterations as a generator of variates: a sample is obtained in this way and a 



Linear Algebraic Equations Under Uncertainty 255 


collocation or moment matching method may be used (see sections 3.3 and 
3.2). 

Algorithm 4.2. Power Iterations 

Require: Nx > 0, kmax > 0, precmin > 0, € A4(n, Nx)', 

local : k, prec, t, xnew; 
generate : C and V\ 
generate : x f0j associated to 
k := 0; 
x:=x(°); 

prec := precmin + 1; 

while k < kmax and prec > precmin do 

determine t : Ct := D.x (or Dt := C.x for the inverse iterations); 
xnew := t / ||t||; 
prec := ||xnew — x||; 
x := xnew; 
k := k + 1; 
end while 

generate : x associated to x; 
generate : PX associated to %; 

A := PX'.A.PX / PX*.PX; 
return X, A 


Example 4.8.- Let us consider again the situation described in example 4.5. 
The results obtained for the minimal and maximal eigenvalues arc shown in 
Figures 4. 19-4.22. In Figures 4. 19 and 4.20, the means have been evaluated by 
numerical integration using the intrinsic function quad of Matlab. In Figures 
4.21 and 4.22, these means have been estimated by using a sample of equally 
spaced values of u. 


4.2.5. Subspace iterations and Krylov iterations 

A simple extension of power iterations is obtained by the simultaneous 
use of a set of initial vectors (in the place of a single vector): we do not 
consider a solitary initial vector, but a set S = jxj°\ ..., xjy* j of kd 
initial vectors. may be generated in order to form an orthonormal set by 
applying Gram-Schmidt’s procedure to a given set .... j 
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of linearly independent vectors provided by the user. Some usual methods for 
the generation of S ^' 1 are given in the following: 

- random choice; 

- eigenvectors of an auxiliary matrix; 

- Krylov vectors: X^ is given and X-®\ = A.X-°\ 




Figure 4.19. Minimal eigenvalue in example 4.8 ( numerical integration). 
For a color version of the figure, see www.iste.co.uk/ 
souzadecursi/ quantification.zip 




Figure 4.20. Maximal eigenvalue in example 4.8 (numerical integration). 
For a color version of the figure, see 
www. iste. co. uk/souzadecursi/ 'quantification, zip 
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Figure 4.21. Minimal Eigenvalue in example 4.8 (21 equidistant points). For a 
color version of the figure, see www.iste.co.uk/souzadecursi/quantification.zip 




Figure 4.22. Maximal eigenvalue in example 4.8 (21 equidistant points). For a 
color version of the figure, see www.iste.co.uk/souzadecursi/quantification.zip 

At step k, S = |x^\...,X^| is given and 

S<(fc+i) = |X| / ' 1 xji^ 1 ^ | is obtained by applying Gram-Schmidt’s 

orthonormalization procedure to the set S (kk 1 J = j A, /, + l ' 1 , .... 11 j, 

generated by one iteration of the power method applied to the elements of 

S l ( fc ); 1,1 1 is the result of the application of the procedure corresponding to 

(k) 

equations [4.23]— [4.24] to X- . The algorithm corresponding to this method 
is algorithm 4.3. 
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Algorithm 4.3. Subspace Iterations 

Require: kmax > 0, precmin > 0, x^ 0 ^ € A4(n, N x ); 

Require: Nx > 0, kmax > 0, precmin > 0, np > 0; 
local : k, prec, t, xnew; 
generate : C e T>\ 

generate T° = . . . , Xm>\ C M(n, N x ); 

for i := 1 to np do 

generate : associated to 

end for 

generate S° = {x[°\ . . . , x^} by Gram- Schmidt’s orthonormalization 

of {t[ 0) , . . ^t^}; 
k := 0; 

prec := precmin + 1; 
while k < kmax and prec > precmin do 
for i := 1 a np do 

generate : x associated to 
determine t : Ct := D.x; 

generate : x associated to t; 

(k) 

Vi ■= m 

end for 

generate T k = {rj[ k \ . . . , rjnp}’, 

generate S^ k+1 ^ = {Xi fe+1 \ • • . , Xn^} by Gram-Schmidt’s 

orthonormalization of T k ; 

prec := 0; 

for % := 1 a np do 

, II (fc+i) (fc)|| 
prec := prec + nx} — x\ U 

end for; 

k := k + 1; 

end while 

for i := 1 to np do 

generate : PX, associated to xf' +l 
Ai := PX;*.A.PXi / PX^.PXi; 

end for 

return Xi, . . . , X np , Ai, . . . , X np 
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As in the preceding situations, the SVD may be used and we may replace 
(A(v) by B(v)) previously defined. In addition, the procedure may also be 
used in order to generate samples of vectors and eigenvectors to be used in 
collocation or moment matching approaches (sections 3.3 and 3.2). 

Example 4.9.- Let us consider again the situation presented in example 4.5. 
We want to determine simultaneously the maximal and minimal eigenvalues 
by using the starting set suggested by Krylov. Results obtained arc shown in 
Figures 4.23 and 4.26, respectively. In Figures 4.23 and 4.24, the means have 
been evaluated by numerical integration using the intrinsic function quad of 
Matlab. In Figures 4.25 and 4.26, these means have been estimated by using a 
sample of equally spaced values of u. 




Figure 4.23. Minimal eigenvalue in example 4.9 ( numerical integration). For a 
color version of the figure, see www.iste.co.uk/souzadecursi/quantification.zip 




Figure 4.24. Maximal eigenvalue in example 4.9 (numerical integration). 
For a color version of the figure, see www.iste.co.uk/ 
souzadecursi/ quantification.zip 
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Figure 4.25. Minimal eigenvalue in example 4.9 (21 equidistributed points). 
For a color version of the figure, see www.iste.co.uk/ 
souzadecursi/ quantification.zip 




Figure 4.26. Maximal eigenvalue in example 4.9 (21 equidistributed points). 
For a color version of the figure, see www.iste.co.uk/ 
souzadecursi/quantification.zip 


4.3. Stochastic methods for deterministic linear systems 

Deterministic linear systems have been extensively and deeply studied in 
the literature. Efficient methods of numerical solution arc widely spread and 
may easily be found. However, the existing methods generally request the 
complete solution of the system, i.e. the determination of all the components 
of the solution, even if we arc interested in a few ones. In some special 
situations, the direct determination of a small subset of components of the 
solution may be interesting and rare approaches consider this question. One 
of them is furnished by stochastic methods tending to generate an 
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independent determination of each component of the solution. Eventually, a 
single component may be directly determined by using a stochastic approach. 
The model situation is the following one: let us consider the linear system 

AX = B, [4.26] 

where X = (X t ) G A4(n, 1) is the unknown; B = ( Bi ) G M (n, f ) is a vector 
and A = ( Aij ) G A4(n, n) verifies 

A = aid — II, [4.27] 

where a > 1 is a real number and II = ( TT, ? ) G A4(n, n) is a matrix such that 

n 

iTij > 0 , V i, j ; ^ TTij = 1 , V i (1 < i, j < n). [4.28] 

3 = 1 

Let us consider f2 = {l,...,n};a function / : fl — > M, given by 
f(i) = Bi , i = 1, ...,n [4.29] 

and a sequence of random variables {C/fc} fcgN C such that 
P (U k+l = j I U k = i) = , V k G N ; u 0 e 52 ; 

and 

V A: G N : P {U k +i = J | U k = i, = i k -i, •••, Uq = io) 

n*i,V (j,i,f fc _i,...,zo)GfI fc+2 ; 

Let 


[4.30] 


[4.31] 


Po = ( J P(^o = i))eA4(rr,l). 
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Our objective is to establish that: 

Theorem 4.1.- Assume that conditions [4.26]— [4.3 1] are satisfied. Then 

+0O . 

Xi = T^E(f(U k ) I Uo = i).u 

z — ' a 

k = 0 


Proof.- 

1) Let i € ft: if P(Uq = i) = 0, then 

2) Let us show that 

P (Pfc+1 ifc+ 1 , U k ifc, Uk—1 ifc— 1) C / 0 *o) 

...iij fc+1 j fc Wo = /„),• [4.32] 

^ (ifc+1, ifc, ifc— I; • •■> io) £ ^ 

The proof is carried out by recurrence: let k = 0 and consider the events 
E = ” U i = n”; F = ”U 0 = i 0 ”. We have 

P(E \ E) = n,: on ; P(F) = P (Uq = i 0 ) . 


Since 

P(£nF)=P(fi | F).P(F), 
we have 

P (Ui = *i, Uq = io) = n,: oil P(f7o = io)- 

So, the hrst domino is placed. For recurrence, let us assume that equation 
[4.32] is verified for a given k > 0. Taking E = ”U k +2 = ifc+ 2 ”; P = 
"U k+ i = if,, U k = i k , ..., Uq = io”, we obtain, from equation [4.31], 

P {E I F) = II i k+1 i k+2 ; P {F) = P (C/fe+i = ifc+t, Pfc = ifc, Po = io) 

and, using that P (E n F) = P (E \ F) .P (P), we have 

P (Uk+2 = ifc+ 2, Uk+i = ifc+i, U k = ifc, Uq = io) = 

E (Cife+t ifc+ii U k ifc, •••, Uq ig) • 
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Thus, by using the recurrence assumption forA: (equation [4.32]) 

P (Uk + 2 = h+ 2, Uk+i = ik+ i , Uk = ik, ■■■, U 0 = io) = 

n i 0 h n qi 2 ...Hi k+1 i k n ik+1 i k+2 p (Uq = io) 

and the relation given in equation [4.32] is verified for the value k + 1, which 
completes the proof by recurrence. 

3) Since 

P (U k+ i = j,U 0 = i) = 

n 

y. P (Uk+i = j, Uk = h, Uk- 1 = ik- 1, ■■■, Uq = i) , 

we have 

n 

p (£4+1 = j,u Q = i) = P(u 0 = i) y rijjj n ili2 ...n iu - , 

^k i^k — 1 = l 

that is 

P (f/ fe+1 = j, Uo = i)= [n fc+1 ] P(f/ 0 = i). 

4) Let us establish that 

VAGN :P(E/fc +1 =j |f/o = i)= [n fc+1 ] Vi, j (1 < i, j < n) 

Let us assume that P(F) / 0: we consider E = ” Uk + 1 = j”; F = ”Uo = 
i” . Then 

P ( [ /, +I = j | [ / 0 = i ) = ^P = [n«] i .. 

Now, let us assume that P{F) = 0: in this case 

n ij = P (Ui = j I Uq = i) = 0, V j (1 < j < n) 
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and 


[n l+1 ] i3 

— p {u i 


n 

z n*u u ili2 •••ii i k j = o 

j \U 0 = i),Vj (1 <j< n). 


and the equality remains valid. 

5) Thus 

n 

E(f{U k )\U 0 = i) = Y J f{j)P ( U k =j \U 0 = i) = 

3 = 1 
n 

3 = 1 

6) In addition 


So 



+oo 


. \k = 0 / . 


+oo 

k = 0 



i 


and the proof of the theorem is complete. ■ 



5 


Nonlinear Algebraic Equations 
Involving Random Parameters 


5.1. Nonlinear systems of algebraic equations 

In this section, we consider systems of algebraic equations: 

F (X, v) = 0 , [5.1] 

where F: J\4(n, 1) x M.(nr, 1) — > A4(n, 1) is a regular function, 

X = (2Q) € A i(n, 1) is the vector of the unknowns, v = (iq, ... ,v nr ) is a 
random vector. In this case, X is an implicit function of v and, so, it is also a 
random vector, which may be considered as a function of v: X = X(v). Our 
objective is the numerical determination of the variable X(v) by using 
expansions, as previously introduced (Chapter 3): let us consider an 
approximation PX of X on a convenient subspace of random variables, such 
that, for instance, 

N x / N x \ 

PX = E X k <Pk (0 i.e. (PX),. = ]T Xj k<Pk (v) ■ [5.2] 

k= 1 V k = 0 / 

In this expression, the unknown to be determined is 
X = ( Xij ) € M(n,Nx). F = {<Pk}keN i s a family conveniently chosen, 
such as, for instance, a total family of a functional space - if v takes its values 
in O C M nr , a convenient choice is L 2 (Q). £ is a random variable 
conveniently chosen and Nx € N* is the dimension of the subspace where 
the approximation is to be determined. 
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Taking <p> (£) = (yq (£) , tp Nx (£)) 4 € M(N X , 1), we have: 

PX = ^(0- [5-3] 

In the foilwing, we apply the methods presented in the Chapter 3 and, in 
addition, we present also other approaches, particularly adapted to equation 

[5.1]. 


5 . 1 . 1 . Collocation 

When a sample X = (Xi, . . . , X ns ) of ns variates from X is available, we 
may consider the linear system: 

PX({ i )=X j , i = , ns. [5.4] 

The numerical solution of these equations furnishes the coefficients x ~ 
as previously observed, this linear system is overdetermined and an adapted 
method must be used, such as, for instance, a solution by minimum squares. 

If, in addition, a sample V = (vi, . . . , v ns ) of ns variates from v, the 
natural choice is X, = X(v*) and ^ = Vj, for i = 1, . . . , ns. The situation is 
analogous if the values of v, may be determined from those of X,. However, 
if we have only a direct sample from X, i.e. if the values of X* arc given, 
but the corresponding values of v, cannot be determined, we may introduce an 
artificial random vector a and take £ = a. As observed in Chapter 3 , the results 
may be poor if the variables arc independent. In this case, it is mandatory to 
create some form of dependence between the variables a and X, what may be 
obtained, for instance, by reordering the samples in an increasing order. This 
approach is particularly useful namely in the situation where multiple solutions 
exist (see examples mentioned afterwards). It may be implemented in Matlab 
by using the programs of section 3.3. 

Example 5.1.- Let us consider the second degree equation: 

F (X, v) = X 2 - 2X + v = 0 , 

where v is uniformly distributed on (0, 1). This equation has as solutions: 

X\ = 1 — y/l — V , X 2 = 1 + Vl — V. 
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Their cumulative distributions arc, respectively, 

r 0, if x < 0 

F\ (x) = < 1 — (x — l) 2 , if 0 < x < 1 ; 

[ 1, otherwise. 

f 0, if x < 1 
F 2 (x) = < (x — l) 2 , if 1 < x < 2 . 

[ 1, otherwise. 

Let us assume that a sample of ns = 11 variates from v is given. In this 
case, we may generate the corresponding samples from X\ and X 2 , by taking 
Xu = X\ (vi) and X 2l = X 2 {v r ) . If such a sample is not available (for 
instance, if the values of v arc unknown), but only the values of X\ and X 2 
arc given, we may consider a random variable a uniformly distributed on 
(—1,1) and consider that X\ = X\ (a), X 2 = X 2 (a). In order to generate 
dependence between these variables, we may, for instance, increasingly order 
X\ i and X 2i and use an increasingly ordered sample a. A uniform grid 
a* = — 1 + 2(i — l)/ns may also be used - these points arc ordered in an 
increasing way. 

The results furnished by a polynomial basis with ns = 11 points and a 
solution of equation [5.4] by minimum squares are shown in Figures 5.1 and 
5.2. 


first root, polynomial, 1^ = 5 cdf of the first root, polynomial, N x = 5 




Figure 5.1. Results for X\ in Example 5.1 (11 random points). For a color 
version of the figure, see www.iste.co.uk/souzadecursi/quantification.zip 
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second root, polynomial, N x - 5 cdf of the second root, polynomial, N x = 5 




Figure 5.2. Resultsfor X 2 in Example 5.1 (11 random points). For a color 
version of the figure, see www.iste.co.uk/souzadecursi/quantification.zip 


The results may be improved if additional points - thus, additional 
information - arc available. For instance, the results obtained for ns = 21 
points arc shown in Figures 5.3 and 5.4. 



Figure 5.3. Resultsfor X\ in Example 5.1 (21 random points). For a color 
version of the figure, see www.iste.co.uk/souzadecursi/quantification.zip 
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Figure 5.4. Results for X 2 in Example 5.1 (21 random points). For a color 
version of the figure, see www.iste.co.uk/souzadecursi/quantification.zip 


It must be noted that the results furnished by a sample of ns equally 
distributed values of Vi arc identical to those furnished by ns equally 
distributed values of a* - this result is not surprising, since, in such a case, 
there exists a connection between these values: a t = 2v t — 1 and the variables 
a and v arc each one a function of the other one. The results corresponding to 
this situation are exhibited in Figures 5.5 and 5.6. 




Figure 5.5. Results for Xi in Example 5.1 (11 equidistant points). For a color 
version of the figure, see www.iste.co.uk/souzadecursi/quantification.zip 
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Figure 5.6. Results for X 2 in Example 5.1 (11 equidistant points). For a color 
version of the figure, see www.iste.co.uk/souzadecursi/quantification.zip 


A situation having a practical interest is these where values of both X\ and 
X 2 arc mixed in a sample and cannot be distinguished: the sample contains 
values generated by both the roots and we do not have supplementary 
information that makes possible a separation in two samples formed by a 
single variable. For instance, let us consider a sample formed by 40 values, 
from which 20 corresponds to X] and 20 others to X 2 . Let us consider 40 
equally spaced artificial values a,, such that a\ = — 1 and 040 = 1. The 
results furnished by the method described are shown in Figure 5.7. 




Figure 5.7. Results for a sample where the roots are mixed - Example 5.1 (20 
random points of each root). For a color version of the figure, see 
www. iste. co. uk/souzadecursi/ 'quantification, zip 
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5 . 1 . 2 . Moment fitting 

A second traditional method for the numerical exploitation of samples is 
the use of the moment matching method , presented in Chapter 3: the 
coefficients of the expansion arc determined in such a way that the first 
moments of the approximation M a (PX) = (Mf , M“) coincide with 
M e = [Mf , ..., M®), the empirical moments on the data. This may be 
achieved by solving the nonlinear system of equations M a (PX) = M e or 
minimizing an objective function corresponding to a pseudo-distance 
d (M a (Px), M e ) measuring the distance between the vectors of moments. It 
may be implemented in Matlab by using the programs of section 3.2. 

Example 5.2.- Let us consider the situation described in Example 5.1. 
Assume that it is given a sample of ns = 21 variates from v. Then, analogous 
to Example 5.1, we may generate samples from X\ and Xi, by taking 
X\ i = X\(yi) and X2 i = Xfivi), which permits the application of the 
moment matching method. The results furnished by a polynomial basis are 
presented in Figures 5.8 and 5.9. 


moments = 5, basis elements = 6 moments = 5, basis elements = 6 

re I err =0.0034882, sum rel err = 0.0016713 rel err = 0.0034882, sum rel err = 0.0016713 

abs err = 0.0013585, norm of eq = 1.0399e-011 abs err = 0.0013585, norm of eq = 1.0399e-011 



Figure 5.8. Results for X\ in Example 5.2 (21 random points). For a color 
version of the figure, see www.iste.co.uk/souzadecursi/quantification.zip 
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Figure 5.9. Results for X 2 in Example 5.2 (21 random points). For a color 
version of the figure, see www.iste.co.uk/souzadecursi/cpiantification.zip 


The results obtained with a sample of equidistant points Vi = (i — 1 )/ns 
are exhibited in Figures 5. 10 and 5.11. 


moments = 5. basis elements = 6 



moments = 5. basis elements = 6 



Figure 5.10. Results for .Yi in Example 5.2 (21 equidistant points). For a 
color version of the figure, see www.iste.co.uk/souzadecursi/quantification.zip 
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Figure 5.11. Results for X 2 in Example 5.2 (21 equidistant points). For a 
color version of the figure, see www.iste.co.uk/souzadecursi/quantification.zip 


When the values of v, are unknown, we may consider - as in Example 5.1- 
an artificial variable a. For instance, let us consider a sample of 20 variates 
from X\ and 20 variates from X 2 , as a sample of 40 equally spaced points a, 
such that ai = — 1 e aqO = 1. The results are shown in Figure 5.12. 




Figure 5.12. Results for a sample of mixed roots in Example 5.1 (20 random 
points of each root). For a color version of the figure, see 
www. iste. co. uk/souzadecursi/ 'quantification, zip 


5.1.3. Variational approximation 

From the variational standpoint, equation [5.1] is approximated by: 


F (PX, v) = 0, 


[5.5] 
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Since (from equation [5.3]), 

F(PX,0 = F (X¥>(0,v) , 
we have: 

ip (£)* D*F ( X <p (0 , v) = 0, V D G 7W(n, iV x ). [5.6] 

Since: 

¥>(£)* D f F (xv>(0> v ) = 

E?=i Em=i (£) D imFi (e£i X fc ¥> fc (fl,v),VDe A*(", AT X ) , 

we have: 

n N x / / N x \ \ 

EE Am-F <p m (0 Fj Xk<Pk (0 , V ) = o, V D G Ad(n, iV x ). 

i=l m=l V Vfe=l / / 

Taking Di m = Si r 6 ms , we obtain: 


^ (p* (0 A (£) > v 


— n i 


Equation [5.7] form a system of n x )Vx nonlinear equations for the 
n x Nx unknowns x = (Xij) G M(n, Nx)- It must be solved by an 
adequate method in order to furnish the coefficients %. It must be noticed that 
the construction of equation [5.7] requests the knowledge of the joint 
distribution of the pair (£, v) - theoretically or empirically, by means of a 
sample of the pair. In addition, better results arc obtained by using £ = v. 

This approach is implemented in Matlab by a modification of the method 
introduced in section 3.1.5. For instance, assume that F (PX, v) is furnished 
by the subprogram F(X,v), while the value of p}.(y) i s furnished by the 
subprogram phi(k,v). Moreover, assume that a program eqsolver(G) is 
available and furnishes the solution for the nonlinear system of equations 
G(C) = 0. In practice, eqsolver(G) may involve additional parameters, 
such as, for instance, a starting point, a maximum number of iterations etc. In 
this case, the reader must adapt the code below in order to include the 
additional parameters. An example of code is given as following: 
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Listing 5.1. UQ of nonlinear equations by variational approach 
function chi = expcoef (F, vs , phi , eqsolver ) 

% 

% determines the coefficients of the expansion . 

% by using a sample 
% 

% IN: 

% F : vector of equations — type anonymous function 
% vs : table of values of v — type array of double 
% vs(: , i ) is a variate from X 

% phi : basis function phi(k,v) — type anonymous function 
% eqsolver: subprogram solving nonlinear system FC(C) = 0 — 
type anonymous 
% fu notion 
% 

% OUT: 

% chi : n x N_X matrix of the coefficients — type array of 
double 

% 

FC = @(C) equations_for_coefficients(F,C,vs,phi,n ,N_X) ; 

Csol = eqsolver( FC ) ; 
chi = zeros (n, N_X) ; 
for r = 1 : n 

for s = 1: N_X 

alffa = index_map ( r , s , n ,N_X) ; 
chi(r,s) = Csol(alffa); 

end ; 

end ; 
return ; 
end 

% 

function w = function_proj (f , chi , vs , phi ) 

% 

% maps f ( PX( v ) , v ) on the sample vs from v 

% f is assumed to have the same dimension as the 
% number of lines of chi 
% 

% IN: 

% f : the function to be evaluated — type anonymous function 
% chi : n x N_X matrix of the coefficients — type array of 
double 

% vi : table of values of v — type array of double 
% vi(; , i ) is a variate from v 

% phi : basis function phi(k,v) — type anonymous function 
% 

% OUT: 

% w : n x ns table of values of f — type array of double 
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% 

ns = size ( vs , 1 ) ; 
n = size ( chi , 1 ) ; 

PXs = proj e c t i on ( chi . vs , phi ) ; 
w = zeros (n , ns ) ; 
for i = 1 : ns 

w ( : , i ) = f ( PXs (: , i ) , v s ( : ,i)); 

end ; 
return ; 
end 


% 

function N = iteration_variational_matrix (phi , f, chi, vs,n.N_X 

) 

% 

% generates the matrix N such that 
% N_rs = E(phi_s(v)f_r(X(v))) 

% assumes that f furnishes a vector of length n 
% and the number of lines of Xs is also n 
% 

% IN: 

% phi : basis function phi(k,v) — type anonymous function 
% f : Iteration function — type anonymous function 
% chi : n x N_X matrix of the coefficients — type array of 
double 

% vs : table of values of v — type array of double 
% vs ( : , i ) is a variate from X 

% n: number of unknowns ( length of X) — type integer 
% N_X : order of the expansion — type integer 
% 

% OUT: 

% N: N_X x n table of scalar products — type array of double 
% 

N = z e r o s ( N_X , n ) ; 
w = function_proj ( f , chi , vs , phi ) ; 
for r = 1: n 

Y = w(r , : ) ; 
for s = 1 :N_X 

fl = @(U) phi(s.U); 

Z = map( fl , vs , 1 ) ; 

aux = scalprod (Y,Z) ; 

N( r , s ) = aux ; 

end ; 

end ; 
return ; 
end 
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function B = equations_for_coef f icient s (F ,C, vs , phi , n ,N_X) 

% 

% evaluates the equations for the coefficients of the expansion 
% 

% IN: 

% F : vector of equations — type anonymous function 
% C : nN_X x 1 vector of the c o effi c i e n t s — type array of 
double 

% vs : table of values of v — type array of double 
% vs(: , i ) is a variate from X 

% phi : basis function phi(k,v) — type anonymous function 
% n: number of unknowns — type integer 
% N_X : order of the expansion — type integer 
% 

% OUT: 

% B : nN_X x 1 mvector of the equations — type array of double 
% 

chi = zeros (n, N_X) ; 
for r = 1 : n 

for s = 1: N_X 

alffa = index_map ( r , s , n ,N_X) ; 
chi (r , s) = C( alffa ) ; 

end ; 

end ; 

N = i t e r a t i o n_ v ar i a t i o n al _m a t r i x ( phi , F, chi, vs,n,N_X); 

B = zeros (size (C) ) ; 
for r = 1 : n 

for s = 1: N_X 

alffa = index_map (r , s , n ,N_X) ; 

B ( alffa ) = N(r , s) ; 

end ; 

end ; 
return ; 
end 

% 

function v = index_map ( i , j , n ,N_X) 
v = i + (j — l)*n ; 

return ; 
end 


Example 5.3.- Let us consider the situation presented in Example 5.E We 
may determine an approximation under the form of a polynomial function with 
£ = v and numerically solve equation [5.6] by the Newton-Raphson iterations. 
The results obtained are shown in Figures 5.13-5.15. In the first figure, the 
means have been evaluated by numerical integration. In the second figure, we 
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consider 21 equally spaced values of v. The last figure corresponds to a sample 
of 21 random realizations of v. 




Figure 5.13. Results obtained in Example 5.3 (numerical integration). For a 
color version of the figure, see www.iste.co.uk/souzadecursi/cjuantification.zip 




Figure 5.14. Results obtained in Example 5.3 (equidistant points). For a color 
version of the figure, see www.iste.co.uk/souzadecursi/quantification.zip 


The use of a variable £ / v is illustrated in Figures 5.16 and 5.17: we 
consider, on the one hand, a sample of ns = 21 variates from v and 21 points 
a,i, equally spaced on (—1, 1), such that a± = —1 and 021 = 1. The results 
obtained by using £ = a and a polynomial function of degree 5 arc shown in 
Figure 5.16, while the results for a polynomial function of degree 8 arc shown 
in Figure 5.17. 
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Figure 5.15. Results obtained in Example 5.3 (random points). For a color- 
version of the figure, see www.iste.co.uk/souzadecursi/quantification.zip 




Figure 5.16. Results obtained in Example 5.3 (£ f v, sample of 
21 variates). Fora color version of the figure, 
see www. iste. co. uk/souzadecursi/quantification.zip 


5 . 1 . 4 . Adaptation of iterative methods 

Let us consider an iterative method for the numerical solution of equation 
[5.1], having as iterating function the method generates a sequence 
|X (p) stalling from the initial point X l(l1 and verifying: 
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Figure 5.17. Results obtained in Example 5.3 (£ v, sample of 
21 points). Fora color version of the figure, see 
www. iste. co. uk/souzadecursi/ 'quantification, zip 


Such a method may be adapted to the determination of PX. Let us 
introduce: 

N x / Nx \ 

PX W = X] (0 i.e. (pxW) . = E 4Uk (v) ■ [5-9] 

fc=l V J k = o / 

Then, the iterations may be approximated as follows: 

PX (p+1) = ^ (PX (p) ) . [5.10] 

let us adopt the variational standpoint; we have: 

tp (£)* D*PX( p+1 ) = (pX (p) ),VDg M(n,N x ). 

and 

Nx n 

Y Y D ™f> m ( t ) xlfc +1 Vfe (o = 

k,m= 1 i= 1 

N x n / Xx 

X] E (£) vp * ( E x ( k )( fk (0 

m=l i=l \fc=l 

1 < i < re, 1 < rre. < JVx- 
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So, 

Nx n 

E Yl DimE (^m (0 xSb +1 Vfc (£)) = 

k,m= 1 i= 1 

N x n / / N x 

E E vm (o ^ E (i) 

m=l i=l V \fc= 1 

1 < i < n, 1 <m < Nx- 
Taking = 5i r 5 ms in this equality, we have, for, 1 < r < n. 1 < s < 

N x 



Nx / / N x \ \ 

E E ^sU)fk^))X ( f^ 1) = E ( ¥>«(£) (E^WO] ) ' [5 ‘ 11] 

These equations form a linear system for the unknowns X^aE’ l et us 
denote: 


•Arsjk — Sj r E {ip s (£) tpk (£)) , B rs 

= E |V S (4) (E EV (£)) j [5.12] 

Then, 

n Nx 

EE^ rsjkX% +1) = Brs, 1 < r < n, 1 < s < Nx- 
j = i k = l 

Let us redefine the indexes by: 

ind(j, k) = (k — 1 )n + j 

and set A = (A a p) € M(nNx,nNx), C = (Cp) G Ad(nAx,l), 
B = (B a ) € M(nNx, 1) given by: 

A a p A r sjk, E a Ersi E/S Xjfc 
a = ind(r, s ), (5 = ind(j , A;). 


[5.13] 
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then: 


AC = B. [5.14] 

The solution of this linear system determines C and, so, x ^ p 1 1 In practice, 
the solution may be determined by solving n linear systems for Nx unknowns 
and a fixed matrix M £ M (Nx,Nx)'- let N r € M. ( Nx , 1) and C r € 
M. (Nx, 1) be such that: 

K = E («,, (0 * r (Efi x^Vk («)) . CJ = xi +1) [ 1 


then. 


MC r = N r ,l < r < n. [5.16] 

The solution of equation [5.16] for a given r furnishes the values of x.rk, 
for 1 < k < Nx- 

In many practical situations, the iteration function reads as: 

¥(X) = X + $(X). 

In this case, the iterations [5.8] take the form: 

x (p+!) = x (p) + AX (p) ; AX (p) = (X (p) ) . [5.17] 

and we have: 

X (p+1) = x (p) + Ax (p) ? [5. 18] 

where A x {p) is determined by solving the linear system AAC = AB, 
analogous to [5.14], with $ replacing T in [5.12] (recall that /3 = ind(j , k))\ 
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Analogous to [5.16], we may determine by solving n linear systems 

MAC r = AN r , analogous to [5.16], with d> replacing VH in [5.15]: 


A N r s = E 


(^Ps (0 



,A Cl 


Ax 


(p+t) 

rk 


[5.20] 


This approach is implemented as follows: assume that, on the one hand, the 
value of </? fc (v) is furnished by the subprogram phi (k, v) while, on the other 
hand, the iteration function <h(X,v) is evaluated by a subprogram Phiiter 
(X,v). The first step is the evaluation of the means involved in the equations 
presented. Such a means correspond to scalar products and may be evaluated 
by integration or using a sample. Here, we assume that a sample of ns variates 
from v is available. Then: 

Listing 5.2. Adaptation of an iterative method 

function chi = expeoef ( f_iter , chi_ini , vs , phi , nitmax , errmax ) 

% 

% determines the coefficients of the expansion . 

% by using a sample 
% 

% IN: 

% f_iter : Iteration function — type anonymous function 
% clii_ini : initial n x N_X matrix of the coefficients — type 
array of double 

% vs : table of values of v — type array of double 
% vs(: , i ) is a variate from X 

% phi : basis function phi(k,v) — type anonymous function 
% nitmax = max iteration number — type integer 
% errmax = max precision — type double 
% 

% OUT: 

% chi : n x N_X matrix of the coefficients — type array of 
double 

% 

M= fixed_variational_matrix (phi , vs); 

A = tab4 (M, n ,N_X) ; 
chi = chi_ini ; 
nit = 0; 
not_stop = 1 ; 
while not_stop 

nit = nit + 1 ; 

delta_chi = iteration_sample(f_iter,chi,vs,phi,A); 
chi = chi + delta_chi ; 
err = norm( delta_chi ) ; 
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not_stop = nit < nitmax && err > errmax ; 

end ; 
return ; 
end 

% 

function w = function_proj (f , chi , vs , phi ) 

% 

% maps f( PX (v) , v ) on the sample vs from v 

% f is assumed to have the same dimension as the 
% number of lines of chi 
% 

% IN: 

% f : the function to be evaluated — type anonymous function 
% chi : n x N_X matrix of the coefficients — type array of 
double 

% vs : table of values of v — type array of double 
% vs (:, i ) is a variate from v 

% phi : basis function phi(k,v) — type anonymous function 
% 

% OUT: 

% w : n x ns table of values of f — type array of double 
% 

ns = size (vs, 1); 
n = size ( chi , 1 ) ; 

PXs = proj e c t i on ( chi , vs , phi ) ; 
w = zeros (n , ns ) ; 
for i = 1 : ns 

w ( : , i ) = f ( PXs (: ,i),vs(: , i ) ) ; 

end ; 
return ; 
end 

% 

function A = tab4 (M, n ,N_X) 

% 

% generates the table A from the table of 
% scalar products of the basis functions 
% M_ij = E( phi _i ( v ) phi _j ( v ) ) 

% 

% IN: 

% M: N_X x N_X table of scalar products — type array of double 
% n: number of unknowns (length of X) — type integer 
% N_X : order of the expansion — type integer 
% 

% OUT: 

% A = nN_X x nN_X table — type array of double 
% contains A( alpha , beta ) 
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aaaa = zeros (n ,N_X, n ,N_X) ; 
for r = 1 : n 

for s = 1: N_X 
for j = 1: n 

for k=s :N_X 
if r == j 

aux = M( s , k) ; 

aaaa ( r , s , j , k ) = aux; 

aaaa ( j , k , r , s ) = aux ; 

end ; 

end ; 

end ; 

end ; 

end ; 

nn = n*N_X; 

A = zeros ( nn , nn ) ; 
for r = 1: n 

for s = 1: N_X 

alffa = index_map ( r , s ) ; 
for j = 1: n 

for k = 1: N_X 

betta = index_map ( j , k , n ,N_X) ; 
A(alffa,betta) = aaaa(r,s,j,k); 

end ; 

end ; 

end ; 

end ; 
return ; 
end 


% 

function B = tab2 (N, n ,N_X) 

% 

% generates the table B from table N 
% N_rs = E(phi_s(v)f_r(PX(v))) 

% 

% IN: 

% N: N_X x n table of scalar products — type array of double 
% n: number of unknowns (length of X) — type integer 
% N_X : order of the expansion — type integer 
% 

% OUT: 

% B = nN_X x 1 table — type array of double 
% contains B( alpha) 

% 

nn = n *N_X ; 

B = zeros ( nn , 1 ) ; 
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for r = 1 : n 

for s = 1: N_X 

alffa = index_map (r , s , n ,N_X) ; 

B(alffa) = N( r , s ) ; 

end ; 

end ; 
return ; 
end 

% 

function M= fixed_variational_matrix(phi, vs) 

% 

% generates the matrix M such that 
% M_ij = E( phi_i (v) phi_j (v)) 

% 

% IN: 

% phi : basis function phi(k,v) — type anonymous function 
% vi : table of values of v — type array of double 
% vi(: , i ) is a variate from X 
% 

% OUT: 

% M: N_X x N_X table of scalar products — type array of double 
% 

M = zeros (N_X, N_X) ; 
for i = 1: N_X 

fl = @(U) phi ( i ,U) ; 

Y = map( fl , vs , 1 ) ; 

A(i,i) = scalprod (Y, Y) ; 
for j = i +1 :N_X 

fl = @(U) phi (j ,U) ; 

Z = map(fl , vs , 1) ; 

aux = scalprod (Y,Z) ; 

M( i , j ) = aux ; 

M( j , i ) = aux ; 

end ; 

end ; 
return ; 
end 

% 

function N = iteration_variational_matrix (phi , f, chi, vs,n,N_X 

) 

% 

% generates the matrix N such that 
% N_rs = E(phi_s(v)f_r(X(v) )) 

% assumes that f furnishes a vector of length n 
% and the number of lines of Xs is also n 
% 


% IN: 
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% phi : basis function phi(k,v) — type anonymous function 
% f : Iteration function — type anonymous function 
% chi : n x N_X matrix of the coefficients — type array of 
double 

% vi : table of values of v — type array of double 
% vs ( : , i ) is a variate from X 

% n: number of unknowns (length of X) — type integer 
% N_X : order of the expansion — type integer 
% 

% OUT: 

% N: N_X x n table of scalar products — type array of double 
% 

N = z e r o s ( N_X , n ) ; 
w = function_proj (f , chi , vs , phi ) ; 
for r = 1 : n 

Y = w( r , : ) ; 
for s = 1 :N_X 

fl = @(U) phi ( s ,U) ; 

Z = map( fl , vs , 1 ) ; 

aux = scalprod (Y,Z) ; 

N(r , s ) = aux ; 

end ; 

end ; 
return ; 
end 

% 

function delta_chi = iteration_sample(f_iter,chi_old,vs,phi,A) 

% 

% evaluates the variation of the coefficients 
% 

% IN: 

% f_iter : Iteration function — type anonymous function 
% chi_old : n x N_X matrix of the coefficients — type array of 
double 

% vi : table of values of v — type array of double 
% vi(: , i ) is a variate from X 

% phi : basis function phi(k,v) — type anonymous function 
% A = nN_X x nN_X table — type array of double 
% contains A( alpha , beta) 

% 

% OUT: 

% delta _chi : n x N_X matrix of the coefficients — type array 
of double 

% chi = chi_old + delta _chi 
% 

n = size ( chi_old , 1 ) ; 

N_X = size(chi_old,2); 
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N = i t er a t i o n_ v ar i a t i on al_ma t r i x ( phi , f_iter 
); 

B = tab2 (N, n ,N_X) ; 
delta_C = A \B; 

delta_chi = zeros ( size ( chi_old )) ; 
for r = 1 : n 

for s = 1: N_X 

alffa = index_map (r , s , n ,N_X) ; 
delta_chi (r , s ) = delta_C ( a 1 f f a ) ; 

end ; 


end ; 
return ; 
end 


chi_old , vs,n,N_X 


function v = index_map ( i , j , n ,N_X) 
v = i + (j — l)*n ; 

return ; 
end 


Example 5.4.- Let us consider again the situation presented in Example 5.1. 
When using Newtons iterations for the solution of the second degree equation, 
the iteration function is: 


= 


X' 2 -2X + v 
2X — 2 


We show in Figure 5.18 the results obtained after 100 iterations, starting 
from Xjk = 1’ Yj> k. In these calculations, we use an approximation by a 
polynomial function of degree 5 and the means are estimated by using ns = 21 
equally spaced points ry. 


In Figure 5.18, we exhibit the results furnished by 100 iterations starting 
from Xjk = O' Vj, k. In these calculations, the means have been evaluated by 
numerical integration using the internal function quad of Matlab. 


5.2. Numerical solution of noisy deterministic systems of nonlinear 
equations 

In this section, we consider the solution of deterministic algebraic systems 
of nonlinear equations: 


F(x) = 0, 
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where F: A i(n, 1) — > M" is function corresponding to the vector of equations 
and x = (x'j) G A 4(n, 1) is the vector of unknowns. We assume that these 
equations are deterministic and do not contain random variables: x is a vector 
of real numbers. 




Figure 5.18. Results obtained in Example 5.4 [ns = 21 equidistributed points 
Vi ). For a color version of the figure, see www.iste.co.uk/ 
souzadecursi/ quantification.zip 



Figure 5.19. Results obtained in Example 5.4 ( numerical integration). For a 
color version of the figure, see www.iste.co.uk/souzadecursi/quantification.zip 


We are mainly interested in the situations where the evaluation of F(x) 
involves errors or noise: for instance, when only approximations of F(x) may 
be constructed. In such a situation, standard methods, for instance 
Newton-Raphson, may fail, since the errors in the evaluation of F(x) and its 
derivatives may introduce spurious oscillations or divergence. In the 
framework of stochastic methods, an alternative is the use of the algorithm of 
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Robbins-Monro, which may be considered as an extension of the simple fixed 
point iterations 

x (fc+1 ) = x (fc) - #F(x (k) ), 9 > 0, x (0) given . 

Such a simple procedure of relaxation may be generalized by the 
introduction of a different 6 at each iteration k: 

x (fc+i) _ x (fc) _ # fc F(x (k '), 9k > 0, x (0) prescribed. 

When the sequence {9k}k e N vei 'ifi e s: 


+oo +oo 

9k > 0, V k € N ; ^2 = +°° 1 < +°° ’ 

i=0 i = o 


[5.21] 


the method is referred to as being the algorithm of Robbins-Monro. These 
conditions may be satisfied simply using: 


Ok 


9 

ci T b {k T 1) 


a,b,9 > 0. 


This algorithm has been initially used in order to solve equations involving 
random variables. Let us consider the iterations: 


x (fc+t) = x (fc) _ e k Y( k \ 

where Y ik ' 1 is an approximation of F(x ,:k ' :i ). By assuming that the error 
£ i k ) = F(x (k) ) — Y- /, 2 is a centered random variable, independent from 
|x(°), ..., xMjd), ..., }, the algorithm converges to x. 

The convergence is established by the following theorem: 

Theorem 5.1.- let (, ) be the usual scalar - product of M n . Assume that: 

1) F : M" — y M n is continuous and bounded: 

sup{||F(x)|| : x € M n } = M < oo ; 

2) There exists x € M n such that F(x) = 0; 
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3) Let, for 5 > 0, A(x, 5) = {y £ R : ||y — x|| > 5}. Assume that there 
exists a function m : M — > M, such that: 

4) The sequence {9 k } k gN verifies [5.21]. 

Under these assumptions, the sequence { x ^} fcgN converges to 
x : x( fc ) — y x for k — > +oo. ■ 

We observe that the condition (iii) implies the uniqueness of the solution, 
since F(y) / F(x) for y / x. 

Proof.- 

1) Let T^')= x^—x. Then: 

T (fc+i) = T (fc) _ Qk _ F ( x )) 

and 

|| T (fc+ 1 )| 2 = ||T (fc) || 2 + e 2 k ||f (x( fc )) || 2 - 2 e k (^T {fc) , F (x (fc) )) • 

Let: 

+oo 

•s = £»i 

k = 0 

and 

k—1 

Z k = |jT (fc) || 2 + M 2 S - Y 9i || F ( xW ) || 2 
i = o 

The assumptions on / show that: 

o < y ° 2 i || F ( xW ) f < m 2 Y d i - m2s ’ 

r- 0 v 0 
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which implies the existence of a real number A > 0 such that: 

+o° 

0 < A < M 2 S and ||f 
;=o 

In addition. 



Zk > 0, V k € N. 

2) We also have: 


k 

z k +i = ||T( fc + 1 )|| 2 + M 2 5-X;0?||F(x( i ))|| 2 . 

i=0 

So, 

z k+1 - z k = ||T( fc + 1 )|| 2 - ||T( fc )|| 2 - el || F (x( fc ))|| 2 

= -2 e k (tW,f( x W)) 

and we have: 

Zk+i = z k - 20 fc (T( fe ),F(x( fe ))). 

Since, 

(T( fc ),F (x( fe ))) = (TW,F (xW) — F (x)) 

= (x( fc )_ x , F (x( fc )) — F(x)) > m (||x( fc )— x||) > 0 , 

we obtain: 


z k +i < z k ,V k e N. 

3) Thus, {z^} k r - f j is decreasing and bounded from below. There exists a 
real number z > 0 such that: 


: (fc) 


z for k 


+oo. 
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4) Consequently, 

2 ^ 1 1 1 2 
tW = - M 2 5 + ^ e\ F (x w ) — > B = z — M 2 S + A. 

i=0 

Since |T( fc )| 2 > 0, V k <E N, we have B > 0. 

5) Assume that B > 0. Then, there exists ko > 0 such that: 

k > k 0 =>■ ||T {fc) || 2 > ^ => ||t (A:) I > > 0. 

So, 

k > k 0 ==> (T( fc ), F (x( fc ))) = (T( fc ), F (xW) - F (x)) 

= (x( fc )— x , F (x( fc )) — F(x)) > m > 0 

and, by taking 



we have: 

k>k 0 ^ z k+1 = z k - 2 6^ (t^, F (x<*>)) < - A0 fc . 

This inequality shows that: 

fc— 1 k—1 

V k > k 0 : z k - z ko = Y 2 &+ 1 - 2i) < -A ^ 0 *. 

i=fco i=fco 

Since, 


^fc+1 < Zk < 2o , v k G N , 
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this last inequality implies that: 

V k > k 0 : 0 < E Oi < Zk ° ~ Zk < ^ ^ 

i=k o 


6) But in this case: 

+oo 

Oi < +oo, 

i=0 

which is in contradiction with the assumptions. So, B = 0 and the proof is 
complete. ■ 

Matlab implementation is performed as follows. 

Listing 5.3. Robbins-Monro 

function X = Robbins_Monro_solution(f_iter , X_ini ,theta_0 ,a,b, 
nitmax , errmax ) 

% 

% determines X by Robbins —Monro iteraations of function f_iter 
% assumes that f_iter and X_ini have same dimension 
% 

% IN: 

% f_iter : Iteration function — type anonymous function 
% X_ini : initial vector of the coefficients — type array of 
double 

% a,b, th et a _ini : c o effi c i e nt s — type double 
% nitmax = max iteration number — type integer 
% errmax = max precision — type double 
% 

% OUT: 

%X : vector containing the solution — type array of double 
% 

X = X_ini ; 
nit = 0; 
not_stop = 1 ; 
while not_stop 

nit = nit + 1 ; 

theta_k = the t a_f adi ng ( ni t , a , b , the t a_0 ) ; 
f_k = f_iter (X) ; 

X = X + t he t a_k * f_k ; 
err = norm( f_k ) ; 

not_stop = nit < nitmax && err > errmax; 
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end ; 
return ; 
end 

function theta = the t a_f ading (k , a , b , theta_0 ) 

% 

% determines the coefficient of Robbins— Monro iteration 
% 

% IN: 

% k : Iteration number — type integer 
% a,b,theta_0 : coefficients — type double 
% 

% OUT: 

% theta : coefficient — type double 
% 

theta = theta_0/(a + b*(k + l)); 

return ; 
end 


295 



6 


Differential Equations Under Uncertainty 


From the mathematical standpoint, differential equations arc essential 
equations. The methods presented in Chapters 4 and 5 may be used for 
uncertainty quantification in differential equations. The main difficulty lies in 
the number of unknowns to be determined. 

From the formal standpoint, we must keep in mind that the solutions of 
differential equations are not elements from a finite-dimensional space, but 
vectors from an infinite-dimensional one: while the solution of the linear 
system AX = B (with A € M(n,n) and B € A4(n, 1)) is a vector 
X € A i(n, 1), the solution of the differential equation x' = ax on (0,T), 
,x(0) = xo is a function x(t) = xoexp (at). Likewise, while the 
determination of X consists of the determination of n real numbers, the 
determination of x expresses the determination of x(t) for each t € (0, T) - 
i.e. the determination of infinitely many real numbers. This specificity has a 
significant impact on the complexity of the calculations connected to the 
problem of uncertainty quantification: in the case of a finite-dimensional 
lineal - system AX = B where A = A(v) and B = B(v), we must 
determine PX = x'f (O' with x € A 4(n, Nx ) (see Chapter 4); in the case 
of a differential equation x' = ax on (0,T), x(0) = xq with a = a(v), we 
must determine Px = %(£)<£> (4), where (0, T) — > x(t) £ Af (1, Nx) is an 
application - i.e. x(t) must be determined for infinitely many values of t. 

In practice, differential equations arc often solved by using discretizations 
that introduce finite-dimensional approximations and a finite number of 
unknown values, such as finite elements, etc. In this case, the complete 
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determination of x(t) is avoided and we limit to the determination of 
approximated values x\ « x (t\), ..., x n « x (t n ) (0 < t\ < ... < t n = T ): 
the unknown becomes X = [xf) € Ai(n, 1), which is the solution of a system 
of algebraical equations F(X) = 0. Such a system may be linear or 
nonlinear, according to the nature of the original equation and the method of 
discretization used. In both cases, the methods based on expansions 
previously presented may be used. Some discretization schemes lead to 
iterative or progressive methods, which lead naturally to the approach 
presented in section 5.1.4. 

The main difficulty arises, as mentioned above, from the so-called curse of 
dimensionality, the number of steps n may be large (eventually very large). 
Analogously, Nx may be large, since it grows rapidly with the number of 
random variables used in the approximation. So, the number of unknowns 
nNx may grow excessively and prevent from effective numerical 
calculation - which is a practical limitation of the methods under 
consideration. 


6.1. The case of linear differential equations 

The remarks previously stated show that uncertainty quantification in 
lineal - differential equations is closely connected to uncertainty quantification 
in algebraic linear systems. For instance, let us consider the following 
ordinary differential equation: 

= AX + B, X(0) = X 0 , [6.1] 

at 

where X: (0, T) — > A4 (n, 1) is the unknown to be determined, A = A(v) € 
M.(n,n), B = B(v) € Ai(n, 1), Xo = Xo(v) € M(n, 1), v = (iq, ...,v nr ) 

is a random vector that models the uncertainties. In an analogous manner to 
those presented in section 4. 1 (see Chapter 4), we look for an approximation 
PX in: 

S = [{<Fi (£) ,-,<Pn x (0}] n 
( Nx 

= l (0 : D k G K n , 1 < k < Nx 

[k = 1 


[ 6 . 2 ] 
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V / +oo \ 

X « PX = Y, XkVk (0 i.e. (PX)j = E Xjk<Pk (v) , [6-3] 

k= 1 V k = 0 / 

where £ a convenient random variable, F = * s a conveniently chosen 

family of functions, <p (£) = (< y 9 1 (£) , <p Nx {i)f G M{N X , 1) and % = 
(Xij) € A i(n, N x ) is the unknown to be determined. We have 

X « PX = XF (0 • [6-4] 

Let us adopt, for instance, the variational standpoint. Analogously to 
sections 4.1 and 5.1.3, the differential equation [6.1] is approximated as: 

PX £ S, E (Y*.X(0)) = E (Y*.X 0 ) and 

E (Y*.4PX) = E (Y.A.PX) + E (Y*.B) , V Y € S'. L °' 5J 

Recalling that 

YgS^Y = D^(0,D = (Dtj) G M(n, N x ), 

on the one hand, we have: 

E {f {if D* f y? (0) = E(<p {if D*A <p (£)) 

+E {if {if D*B) , V D € M(n, N x ) ; 

and, on the other hand, 

E {if {if (0 ) = E {if {if D f X 0 ) . [6.6] 

We have: 


e(W)‘ d ‘^({)) 


n Nx 7 

E E B fe( v )Amft(v))^, 


[6.7] 


n iVx 

E {if (v)* D j Ay^ (v)) = E E ^ (<Pm (v) D im Aijf k (v)) Xjk 

i,j=l k,m = 1 
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and: 


n Nx 

E (cp (v)* D*B) =Y,Y, E ^m (v) D im Bi ) . 

i=l m = 1 

So, by taking Di m = 5i r 5 ms , equation [4.6] leads to, for 1 < r < n, 1 < 

s < Nx- 


E 


k= 1 


E {<Ps ( v ) Vk ( v )) + XI ^ ( v ) ( v )) 

j=t 

= E {<p s (v) B r ) . 


[6.8] 


Let: 


•M-rsjk — 5jrE {p s (v) ip k (v)) , Arsjk = E ( <p a (v) A rj <p k (v)) , 
e rs (B) = E (p s (v) B r ) . L J 


We have, for 1 < r < n, 1 < s < Nx 

n Nx 


EE 

j = 1 k= 1 


' dXjk . 

Nlrsjk rsjkXjk 


= Crs (B) . 


Let us consider: 


ind(j , A) = (A; — l)n + j, [6.10] 

and the matrices M = (. M a p ), N = (N a p) G M(nNx,nNx), Y = (Yp) 
and Q = (Qp) G JA(nNx, 1) such that: 


M a p — Ntrsjki N a p — A rsjki Qa — 6-rs (B) , Yp = x jk , 
a = ind(r , s ), f3 = ind(j , k) 

Then, we have: 

dY 

M— = NY + Q. 


[6.12] 
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In addition: 


n Nx 

£(^(v)*D‘X, o) = EE E (Vm ( v ) DimXoi) , 

i — 1 m= 1 


n Nx 

E (<P (€)* D *X (0) v> (0) = Y Y E (V’m ( V ) Amft (v)) Xifc(O) 

i=l fc,m=l 

and equation [6.6] shows that, for 1 < r < n, 1 < s < Nx, 


N x 

Y \ E {Vs ( v ) <Pk ( v ))]Xrfc(°) 


k = 1 


(v)X 0r ). 


Thus, for 1 < r < n, 1 < s < Nx , 


n Nx 

EE ■MrsjkXjk (0) — Crs (X 0 ) • 

3 = 1 fc=l 

So, by setting Zo = (Zo/?) € M(nNx, 1) such that: 

Zo« = Crs (Xo) , a = ind(r, s ), [6.13] 

we obtain: 

MY (0) = Z 0 . [6.14] 

Equations [6.12] and [6.14] form a linear system of differential equations 
that may be solved in order to determine the unknown Y. Once determined Y, 
we may construct x by using equation [6.1 1], and finally, PX. 

Matlab implementation is performed as follows: assume that, on the one 
hand, the value of <p fc (v) is furnished by the subprogram phi (k,v), while, on 
the other hand, the function f(t,X,v) is evaluated by a subprogram 
f (t ,X,v). Similarly to the preceding situations, the evaluation of the means 
involved in the equations is requested and they correspond to scalar products, 
which may be evaluated by integration or using a sample. Here, we assume 
that a sample of ns variates from v is available - using the evaluation of the 
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scalar products by numerical integration is a simple modification of the code 
below. 

Listing 6.1. Linear differential equation 

function chi = expcoef (A,B, phi , n ,N_X, vs ,X0, time_span) 

% 

% determines the coefficients of the expansion. 

% by using a sample 
% 

% IN: 

% A : n x n matrix of the linear system — type anonymous 
fu ncti o n 

% B : n x 1 matrix of the linear system — type anonymous 
fu ncti o n 

% n : number of unknowns — type integer 
% N_X : order of the expansion — type integer 
% vi : table of values of v — type array of double 
% vi f: , i ) is a variate from v 

% f : second member — type anonymous function 
% XO : n x 1 initial vector — type anonymous function 
% time _span : ntimes x 1 vector of time moments where the 
solution 

% has to be evaluated — type array of double 


% OUT: 

% chi : ntimes x n x N_X matrix of the coefficients — type 
array of double 

% 

aux = fixed_variational_matrix ( phi , vs); 

M= tab4 ( aux , n ,N_X) ; 

[N,Q] = v ari a t i o n al_m a t r i c e s (A, B , phi , n ,N_X, vs ) ; 

F = @(t,U) second_member (M,N,Q, t ,U) ; 

PXO = zeros (N_X, n) ; 
w = map (XO , vs , n ) ; 
for r = 1 : n 

Y = w( r , : ) ; 
for s = 1 :N_X 

ft = @(U) phi ( s ,U) ; 

Z = map( ft , vs , 1 ) ; 

aux = scalprod (Y,Z) ; 

PX0( r , s ) = aux ; 

end ; 

end ; 

Y0 = zeros (n*N_X, 1 ) ; 
for r = 1 : n 

for s = 1: N_X 

alfa = index_map ( r , s , n ,N_X) ; 
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YO(alfa) = PXO ( r , s ) ; 

end ; 

end ; 

% 

UO = M \ YO; 

[TT,UU] = ode45 (F, time_span ,U0) ; 

% 

chi = zeros ( length ( time_span ) ,n ,N_X) ; 
for i = 1: length ( time_span ) 
aux = UU( i , : ) ’ ; 
auxx = zeros (n ,N_X) ; 
for r = 1: n 

for s = 1: N_X 

alfa = index_map (r , s , n ,N_X) ; 
auxx(r,s) = aux(alfa); 

end ; 

end ; 

chi ( i , : , : ) = auxx ; 

end ; 
return ; 
end 

function w = second_member (M,N,Q, t ,U) 
aux = N*U + Q; 
w = M \ aux ; 

return ; 
end 


Example 6.1.- Let n = 2 and the differential equation: 


dX 

dt 


f 0 -ni 

V^i 0 


x , X(0) 


f V 2 

Vo 


, on (0, T). 


The exact solution is 

X(t) 

\n 2 sm(ni t) J 


Let us assume that v = (vi,v 2 ) is a pair of independent random variables 
such that v \ is uniformly distributed on (a\, b \ ) and n 2 is uniformly distributed 
on (a 2 , 6 2 ). We consider the basis (0 < i < n\, 0 < j < n 2 ): 


V’k (v) 


f vi - ai V / V'2 ~ a 2 V 

\h-aij \h-a 2 J 


k = ( j — l)n\ + i. 





Figure 6.2. Results for t = 1 in Example 6.1 (m = n 2 = 2 ). For a color- 
version of the figure, see www.iste.co.uk/souzadecursi/quantification.zip 
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Figure 6.1. Results for t = 0.5 in Example 6.1 (m = n 2 =2). For a color- 
version of the figure, see www.iste.co.uk/souzadecursi/quantification.zip 
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Example of temporal prediction for component 1 



Figure 6.3. Results for Example 6.1 with nts = 10 (n\ = ri 2 =2). For a color 
version of the figure, see www.iste.co.uk/souzadecursi/quantification.zip 





Example of temporal prediction for component 1 


Example of temporal prediction for component 2 


Figure 6.4. Results for Example 6. 1 with nts = 40 (m = ri 2 = 2). For a color 
version of the figure, see www.iste.co.uk/souzadecursi/quantification.zip 


The results obtained for T = 1, a\ = 1.87T, b\ = 2.2-rr, ci 2 = —0.1, 
b\ =0.1 and n\ = ri 2 = 2 are shown in Figures 6. 1-6.4. The differential 
equation formed by equations [6.12] and [6.14] has been solved by using the 
internal Matlab function ode45 and the solution has been evaluated at nts time 
instants corresponding to t = At, 2At, ..., ntsAt = 1, At = T/nt-s. The 
means have been evaluated by numerical integration using the internal function 
quad of Matlab. For nts = 10, the global mean quadratic error is of IE — 4 
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and the associated relative error is lower than 0.3%. Figures 6. 1-6.2 compare 
the random variables X(t), PX(f) for fixed values of the time t and all the 
values of v, while Figure 6.3 compares the prediction PX(t) to the exact value 
X(t) at a given point v and all the values of t. The results are analogous for 
nts = 40. Figure 6.4 shows an example of result for nts = 40. 


6.2. The case of nonlinear differential equations 

The method presented in the previous section may be adapted to nonlinear 
differential equations. Let us consider: 

^ = f(f?X), X(0) = X 0 . [6.15] 

In this case, we have: 

PX £ S, E (Y*.X(0)) = E (Y*.X 0 ) and 

E (Y*.^) = E (Y f .f(t, PX)) , V Y g S. L °' bJ 

Thus, equation [6.1] can be written as: 

E (v (£)* D (£)) = E (*> (4) f D*f (t, (£))) , 

VDg M.(n, Nx) ; [6.17] 


and, since: 


n Nx 

E {if (v)* D*f (t, XV (£))) (v) Dimfi {t, XV (£))) . 

i= 1 771= 1 


we have: 


n N x 

E ^m ( v ) D imVk ( v )) = 


n Nx 

XX E {Vm ( v ) Dimfi (t, XV (£))) • 

i= 1 771=1 
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So, by taking I) irn = Si r S ms , for 1 < r < n, 1 < s < Nx, we have: 


N x 

E 


E iv s ( v ) Vk ( v )) 


dXrk 

dt 


= E(ip s (v) f r (t, XV ($.))) ■ [ 6 . 18 ] 


Let us consider Y defined in equation [6.1 1] and: 

K (t, Y)=E (<p s (v) f r ( t , XV (0)) 


Then, we have: 
dY 

M— = F(t,Y). [6.19] 

with the initial condition [6.14]. This approach is implemented in Matlab as 
follows. 

Listing 6.2. Nonlinear differential equation 

function chi = expcoef ( f , phi , n ,N_X, vs ,X0, time_span) 

% 

% determines the coefficients of the expansion . 

% by using a sample 
% 

% IN: 

% f : second member f(t,x,v) — type anonymous function 
% B : n x 1 matrix of the linear system — type anonymous 
fu ncti o n 

% n : number of unknowns — type integer 
% N_X : order of the expansion — type integer 
% vs : table of values of v — type array of double 
% vs (:, i ) is a variate from v 

% f : second member — type anonymous function 
% X0 : n x 1 initial vector — type anonymous function 
% time _span : ntimes x 1 vector of time moments where the 
solution 

% has to be evaluated — type array of double 


% OUT: 

% chi : ntimes x n x N_X matrix of the coefficients — type 
array of double 

% 

aux = fixed_variational_matrix ( phi , vs); 

M= tab4 ( aux , n ,N_X) ; 

F =@(t,U) second_member ( f , t ,U, vs , phi ,M, n ,N_X) ; 
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PXO = zeros (N_X, n); 
w = map (X0 , vs , n ) ; 
for r = 1 : n 

Y = w(r , ; 
for s = 1 :N_X 

fl = @(U) phi ( s ,U) ; 

Z = map( fl , vs , 1 ) ; 

aux = scalprod (Y,Z) ; 

PX0( r , s ) = aux ; 

end ; 

end ; 

Z = zeros(n *N_X , 1 ) ; 
for r = 1 : n 

for s = 1: N_X 

alfa = index_map ( r , s , n ,N_X) ; 

Z( alfa ) = PXO ( r , s ) ; 

end ; 

end ; 

% 

Y0 = M \ Z; 

[T,Y] = ode45 (F , time_span , Y0) ; 

% 

chi = zeros ( length ( time_span ) ,n ,N_X) ; 
for i = 1: length ( time_span ) 
aux = Y( i , : ) ’ ; 
auxx = zeros (n ,N_X) ; 
for r = 1: n 

for s = 1: N_X 

alfa = index_map (r , s , n ,N_X) ; 
auxx ( r , s ) = aux(alfa); 

end ; 

end ; 

chi ( i , : , : ) = auxx ; 

end ; 
return ; 
end 

function w = second_member ( f , t ,U, vs , phi .M, n ,N_X) 
chi = zeros (n, N_X) ; 
for r = 1 : n 

for s = 1: N_X 

alfa = index_map ( r , s , n ,N_X) ; 
chi (r , s ) = U( alfa ) ; 

end ; 

end ; 

ff = @( X , V ) f ( t , X , v) 
w = f unc t i o n_proj ( ff , chi , vs , phi ) ; 
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aux = mean(w, 1 ) ; 
w = M \ aux ’ ; 

return ; 
end 


Example 6.2.- Let us consider n = 1 and the logistic equation: 


dx 

dt 


ax (1 — x) , x(0) = xq . 


Its exact solution is: 


x(t) 


xo exp (at) 

1 — xq + xq exp (at) 


Let us assume that v = (a, xo) is a pair of independent random variables 
such that a is uniformly distributed on (0, 1), while xq is uniformly 
distributed on (0.5, 1.5). The differential equation [6.19] with initial condition 
[6. 14] has been numerically solved using the function ode45 of Matlab and 
the solution has been evaluated at the times ti = iAt, i = 1 ,...,nts. 
At = T/nt-s. The means have been determined by numerical integration 
using the Matlab function quad. Ligure 6.5 shows an example of result for 
nts = 10. 



Figure 6.5. Results obtained in Example 6.2 (n\ = rt 2 =2) 
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Example of temporal prediction (nj = n 2 = 2) 



Example of temporal prediction (nj = n 2 = 2) 


Figure 6.6. Results obtained in Example 6.2 (m = ri 2 =2) 


6.3. The case of partial differential equations 
6.3.1. Linear equations 

Partial differential equations (PDEs) may be studied by using the same 
methods, in particular when they are written in a variational form. For 
instance, let us consider a functional space V : 

a bilinear application: 

V x V 3 (x, y) — > a (. x , y) € M. (1, 1), 

a linear functional: 

V3y-^b(y)eM( 1,1), 
and the variational equation: 

x € V and a ( x , y ) = b(y), V y G V. 

Usually, the discretization of this last equation involves a finite family 
{ ' , l'i } i <i<n ( suc h us, for instance, finite elements) and the subspace 

V n = ■ The variational equation is discretized as: 

x € V n and a (x, y) = b(y), V y G V n . 

Let us consider: 


Aij = a ;Bi = b(ipi ) ; X = (xi, ..., x n Y \Y = (y 1 ,...,y n ) t . 
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In this case, the approximated linear variational equation becomes: 
X£l n and Y ( AX = Y f B , VY € R n , 
which corresponds to the linear system: 

AX = B. 


Applying the method presented in Chapter 4 (section 4.1), we have: 

n Nx 

^ ^ ^ ^ ArsjkC'jk — 1 ^r^TT-,1 S ^ N X -> 

j = 1 k=l 

where: 


A-sjk = E (ip s (v) A rj (p k (v)) , B rs = E (v? s (v) B r ) . 

Thus, by using the same variables (see section 4.1) 

M = {M ap ) € M(nN x ,nN x ), U = {Up) € A4(nA x ,l), 
A = (N a ) £ M.{nNx , 1) given by: 

Af Q/3 = A4 rsjfc , iV a = A/" rs , (7/3 = C jk , a = ind{r , s), /3 = ind{j , fc), 

we have again: 


MU = N. 


6.3.2. Nonlinear equations 

Let us assume that the equation ( x , y) — > a (x. y) is not bilinear, but 
that only y — » a ( x , y) is linear. In this case, the approximated variational 
equation can be written as: 

X € R n and Y*F(X) = 0 , VY G M n , 

where: 




h 


Fi (X) = a 
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i.e.: 

X£l n and F(X) = 0. 

Let us apply the method introduced in Chapter 5 (section 5.1.3): we have: 

F rs (x) = 0, 1 < r < n, 1 < s < Nx, 

where: 

E rs (X) = E (< Ps ( V ) F r (X<P (v))) . 

Thus, by introducing the same vectors G = ( G a ) € M(nNx, 1), U = 
(Up) € M(nNx, 1) such that: 

G a = F rs , U /3 = Xjk ’ Q = ind(r, s), (3 = ind(j, k ), 

we have: 

G(U) = 0. 


6.3.3. Evolution equations 

Let us consider a variational equation having a time dependence: 

{ dx \ 

x G V and A f — , y J + a (x, y) = &(£, y), V y € V and t £ (0, T), 
where M is bilinear. Let: 

Aij = A (V’j, V’i) • 

Since: 


A (^’ y ) = XT 

' ' 4 7 = 1 


dx i = Y t A d * 
dt dt 
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analogously to the preceding situations, we have: 


Y*A — + Y* J'(X) 


0 ,VYe IT, 


where: 


Ti (: t , X) = a ^2 x k^k (0 > A (£)J ~ b i- 

This variational equality leads to differential equations defining the 
approximation. In the following, we consider the linear and the nonlinear 
situations. 


6.3.3. 1. The linear situation 

For lineal - equations, the approximated variational equation can be written 
as 


A— + AX = B. 

at 


Let us introduce 


•M-rsjk — 5jrE{tp s (v)A rj <p k (v)), 

A -Sjk = E (<p s (v) A rj ip k (v)) , C rs (B) = E (<p s (v) B r ) . 

We have, for 1 < r < n, 1 < s < Nx , 


n Nx 

XX 

j = 1 k= 1 


M 


rsjk 


dXjk 


dt 


+ A r 


>jkXjk 


= C rs (B) . 


Thus, using the same variables M = (M a p), Q = (Qp) € M(nNx, 1), 
N = (N a p) € M(nN x ,nN x ), U = (Up), such that 

3 1 a .1 JCl rs j k , N a p Arsjk, Qa C rs (B) , Up Xjk > 

a = md(r, s ), (5 = ind(j , k), 

we have: 

= NU + Q. 

dt 


[6.21] 
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6. 3. 3. 2. The nonlinear situation 

In the nonlinear case, the approximated variational equation can be written 
as: 


A 


dX 

dt 


f (*,X), 


where: 


fi {t, X) = bi - a I ^2 x k^k (£) , V’j (£) 


Thus, by considering: 

F a (t, U) = E (cp s (v) f r (t, XV (0)) , « = ind(r, s ), 

we have: 

= F(t,U). 
dt v ’ ’ 

6.4. Reduction of Hamiltonian systems 

In this section, we consider differential equations Hamiltonian systems , i.e. 
differential equations corresponding to Hamilton’s equations of a system. In 
this case, stochastic methods may be used in order to reduce the number of 
unknowns and we may generate equations destined to furnish approximate 
values of a few variables - eventually a single one - without determining the 
others. 


6.4.1. Hamiltonian systems 

Hamiltonians are usually introduced by using transformations of 
Lagrangians. Let us consider a mechanical system described by the 
coordinates q = (qi, ■■■, q n )- Let us denote by t the time variable and by 
q =dq/dt the time derivatives. Assume that the system has a kinetic energy 
K = I\ (t. q, q) and total potential energy V = V (t, q). The Lagrangian is: 


L (t, q, q) = K (t, q, q) — V (t. q) . 


[ 6 . 22 ] 
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The action A on a time interval (0, r) is given by: 


A (q) = / L ( t , q, q) (it, 

•/o 

and the equations of motion are written in variational form: 

DA (q) (<5q) =0, V <5q such that <5q (0) = Sq (r) = 0, 

where DA denotes the Gateaux derivative of A. This variational equation 
corresponds to the Euler-Lagrange’s equations: 


d_ LdL\ 
dt \ dqi ) 


— — , 1 < i < n. 

oqt 


[6.23] 


Let us introduce the transformation: 

(q,q)^>(Q,P) [6.24] 


which is given by 


Q (q, q) = q ; P (q, q) = p ; = — , 1 < i < n. [6.25] 

dqi 

p is the generalized moment. Equations [6.23] can be written as: 


dL dL 

Pi = wr- , Pi = , 1 <i<n. 

dqi dqi 


[6.26] 


Let us denote T” 1 = (Q 1 . P 1 j the inverse transformation associated 
with T (equation [6.24]). The Hamiltonian H of the system is obtained by a 
Legendre ’s transformation of the Lagrangian L : 

H (■ t , q, p) = p*P _1 (q, p )-L (t, q, P^ 1 (q, p)) . [6.27] 

This transformation may be interpreted as a Fenchel’s conjugate of L with 
respect to q: 


H (t, q, s) = sup {s 4 x — L(t, q, x) : x € M n } . 



316 Uncertainty Quantification and Stochastic Modeling with Matlab® 


By deriving the expression s f x — L (t. q, x) with respect to xy, the reader 
may easily verify that the maximum is attained for s t = dL/dqi = p t . We 
have, on the one hand: 

H (t, q, p) = p*q -L (t, q, q) ; q = P 1 (q, p) . 

Thus: 


dH 

dpi 


3 = 1 



d L \ dqj 
dqj ) dpi 


= o 


= Qi- 


[6.28] 


On the other hand: 

L (: t , q, q) = p*q -H (t, q, p) ; p = P (q, q) . 

Thus, equation [6.28] shows that: 

d L dH ^ / . dH \ dpj dH 

dqi dqi dp.j ) dqi dqi 

1 = 1 V 

J V 

= 0 


and equation [6.26] yields that 


dH 



So, the motion’s equations may be written as: 
dH dH 

qi = — , pi = , 1 < i < n. [6.29] 

dpi dqi 

These arc the equations of motion in Hamiltonian form, or Hamiltonian 
motion’s equations. 
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6.4.2. Reduction of autonomous Hamiltonian systems 


Hamiltonians have many interesting properties such as: 


dH OH 


dH dH 


Ull Ull \ ^ / Ull Ull \ Ull \ ^ 

~di = -W + X, = -W + X ( - ! "« + « k) 

7=1 7=1 


As a result, a Hamiltonian that does not contain explicit dependence on 
time is constant on the motion, i.e.: 


A system is said to be autonomous if its equations of motion do not 
contain explicit dependence on time. For instance, a system such that 
K = K (q, q) and V = V (q) has a Lagrangien L(q, q) = A"(q, q) — V (q), 
and consequently, a Hamiltonian H( q, p) = p*P _1 (q, p) — 

L(q, P _1 (q, p)), which is independent of time. For such a system, equation 
[6.30 shows that H is time-invariant and 

/(p,q) = exp (-XH (q, p)) 

defines a strictly positive time-invariant function: 


/ (P, q) > 0 and — = 0. 

These properties make / (p, q) suitable in order to define an invariant 
probability: 


<p( q- p) = j /(q. p) > 

on the phase space (A is a normalizing factor). In addition: 

= ~ x f (p. q) = x f (p. q) i>» 

df \f( \ dH 
dpi = -A/(p,q) g^ = - 


x f (Pi q) Qi- 
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Let us consider = {1, ...,n} and two disjoint subsets I = {zi , 

J = Q — I = {j { , jk} {k = n — m). Assume that we are interested only in 
the variables {(q%,Pi) : i € 1} and we want to eliminate the variables 
{(q 3 ,pj) : j € J} . We denote: 


q i = (qii,qi 2 ,-,Qim) • p/ = fei.p» 2 i 


and, analogously. 


qj = (qji,qj2,-> qj h ) • pj = (Pji.^a. • 


A simple idea in order to obtain equations concerning the only variables of 
interest (q/, p/) has been introduced by Chorin and consists of considering 
the approximated equations: 


= E 

dt 


dH u ^ 

Tj— q/, pj 

dpi 


djH =E 

dt 


SH 

-% l(cp ' p '> 


, % € I, 


[ 6 . 31 ] 


where the conditional expectations arc evaluated by using the density 
<p (q, p). By introducing: 


fi (q /> P/) = J f (p, q) dcudpj = J f (P, q) dq n dp n ...dq ]k dp ] 


we obtain: 


/a H 

£ (% l(q, ' p '' 


Vf7 )/s' feq)Jq * 

S k 


fi (q/ 


and: 


£ (% l<q, - p '- 


VpT) / 

Sk 


fi (q / 


Let: 


#/ = - 


^ log (fi (q/, p/)) 
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We have, for i G I: 


dH! 

dqt 


fl (q I, Pi) 


Sk 


dH 

dqi 


f (p, q) dcudpj 


E 


(^ Kq, ' p,: 


and 


dpi 


1 


fi (q i, pi) 


s k 


dH 

dpi 


f (P, q) dcudpj 


^ ( dH 

£ (% Kq, ' p '- 


Thus: 


dqi dHj dpi dHi 

dt dpi ' dt dqi ’ 


[6.32] 


This equation shows that the approximated system [6.31] is Hamiltonian 
and that its Hamiltonian is Hj. Moreover, the equations contain only the 
variables of interest (q/, p/). 


6.5, Local solution of deterministic differential equations by stochastic 
simulation 

6.5.1. Ordinary differential equations 


Let us consider the ordinary differential equation: 

a (x) u" ( x ) + /3 (x) v! (x) + 7 (x) u (x) + / (x) = 0 on ( 0 , £) ; 
u ( 0 ) = uq\ u (£) = U£. 

By setting H = (0, £), = {0, £}: 

Udn (x) = rto, if x = 0 ; u dQ (x) =u t , ifx = £ , 

we have: 


a (x) u" (x) + (5 (x) v! (x) + 7 (x)u (x) + / (x) = 0 on H; 
u = non on dQ. 
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Let us assume that a > 0: we consider: 
a (x) = a/2 a ( x ); b (x) = [3 (x) 
and the stochastic diffusion: 

dX t = a (X t ) dW t + b (X t ) dt; X (0) = x € Q. 
Let us introduce the stochastic process: 

Y (t) = u (X (t )) exp (j 7 (X(s)) ds 


We have Y = F 


' 1 A 

t,X(t),J 7 (X(s))ds 


where: 


F (t, x, z) = u (x) exp (z) , 

so that: 

dF . , 'dF dF j . . . d 2 F 

¥ = «( I )e I ph); ¥ = 0 ; s = »(x)«xp W; = “ 

Thus: 


dY t = (A t dW t + B t dt) exp 


7 (X(s))ds 


where: 


A t = a (X t ) u' (X t ) ; 

B t = \[a (X t )} 2 u" (Xt) + b (X t ) u' (X t ) + 7 (X t ) u (X t ) 

Let: 


[6.33] 


(x) exp (z) . 


t = inf {t > 0 : X (t) (f f2} 
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for t < t, we have X t G Q, so that: 


B t = - [a (. X t )] 2 u " (X t ) + b (X t )u' (X t ) + 7 (X t ) u (X t ) = -f (X t ) 


a(X t ) 

and consequently: 




dY t = ( A t dW t - f (X t ) dt ) exp / 7 (X(s)) ds . 


Thus: 


Y (t) — Y (0) = j A t exp ^7 (X(s)) ds j dW t - 

T / t \ 


f (X t .) exp / 'y(X(s))ds dt. 


0 

Since: 


E / At exp / r y(X(s))ds dWt = 0 and Yq = u(x) 


we have: 


u (x) = E (Z (t)) ; Z (r) = y (r) + / / (Xt) exp / 7 (X (s)) ds dt. 


For t < r, we have Xt G and there exists e > 0 such that for r < t < 
t + e, we have X f ^ Q, so that X r € and: 


u(x) = E(Z(r)) , 



322 Uncertainty Quantification and Stochastic Modeling with Matlab® 


where: 


Z (r) = u a n (X T ) exp / 7 (X(t)) dt 


+ f(X(t)) ex p / j(X(s))ds dt 


0 \o / 

This result may be exploited for the numerical determination of u {x). We 
discretize [ 6 . 33 ] by using a standard method. For instance: 

X n+ i - X n = a (X n ) A W n + b (X n ) At; X 0 = x € ft . (Euler) 

Such a discretization furnishes a sequence of values {X n } n > 0 , which may 
be used in order to determine: 

n = min {i > 0 : X* ^ f)} , 
which gives an approximated realization of Z (r): 


Z (r) « uqsi (X n ) exp At + 


j = 0 


At E 


/ X L ) f.'Kp I Af ^ 7 ( Xj ) 

V j = 0 


By repeating this procedure nr times, we generate a sample of Z (r), 
formed by the values Z\, ..., Z nr and we may estimate: 


u ( x ) 


1 

nr 


nr 


E *■ 


Matlab implementation is performed as follows: assume that 

a(x),b(x),c(x),f(x) are furnished by the subprograms a(x), b(x), c(x), f(x), 
respectively. In addition, assume that a subprogram interior(x) returns either 
the value false (or 0 ) if x £ ( 0 , t) or the value true (or 1 ) if x € ( 0 , Finally, 
assume that the boundary values are furnished by a subprogram ub(x) such 
that 1x6(0) = no a nd ub(i) = 17. The following is an example of Matlab code. 
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Listing 6.3. Deterministic boundary value problem 

function v = ode_stoc_part (a ,b , c , f , interior , u_b , xc , ns , delta_t ) 

% 

% determines the solution of the ODE 

% ( a ( x ) A 2/2 ) *u ’ ’ ( x ) + b(x)*u’(x) + c ( x ) *u ( x ) + f(x) = 0 on ( xl 

, xu ) 

% u( xl ) = ul , u( XU ) = uu 

% at the points of vector xc 

% by performing ns simulations with step delta_t 
% 

% IN: 

% a,b,c,f : coefficients — type anonymous function 
% interior : true , if x is interior ; false otherwise — type 
anonymous function 

% u_b : boundary value of the solution — type anonymous 
fund i o n 

% xc : points where the solution has to be evaluated — type 
array of double 

% ns : number of simulations — type integer 
% deltat_t : time step — type double 
% 

% OUT: 

% i' : estimation of the value — type array of double 
% 

v = zeros ( size ( xc ) ) ; 
aa = @(x , t ) a(x) ; 
bb = @(x, t) b ( x ) ; 
for n = 1 : length ( xc ) 
x = xc ( n ) ; 
s = 0; 

for i = 1 : ns 

aux = Z_variate ( aa , bb , c , f , delta_t .interior ,u_b,x); 
s = s + aux ; 

end ; 

v(n) = s/ns; 

end ; 
return ; 
end 

function v = Z_variate(a,b, gama ,f,dt, interior . u_boundary , xa ) 

% 

% furnishes one variate from Z 
% 

x = xa ; 
t = 0; 

inreg = i n t e r i o r (x . t ) ; 
s_gama = 0; 
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s_f = 0; 

ndimw = length (x); 
while inreg 

s_f = s_f + f ( x ) *exp ( dt * s_gama ) ; 
s_gama = s_gama + gama(x) ; 

[xn, tn ] = new_point_ito ( a , b , x , t , ndimw , dt ) ; 
x = xn ; 
t = tn ; 

inreg = i n t e r i o r (x , t ) ; 

end ; 

v = u_boundary ( x ) *exp ( dt * s_gama ) + dt*s_f; 

return ; 
end 

function [xn, tn] = ne w_point_ito (a , b , x , t , ndimw , dt ) 

% 

% furnishes the new point corresponding to one step of 
% the numerical simulation of Ito ’s diffusion 
% dX_t = a ( X_t , t) dW_t + b ( X_t , t) dt 

% 

% IN: 

% a: the first coefficient — type anonymous function 
% b: the second coefficient — type anonymous function 
% x: the actual point — type double 
% t: the actual time — type double 
% ndimw: dimension of the Wiener process 
% dt : the time step — type double 
% 

% OUT: 

% xn : the new value of X — type array of double 
% tn : the new time — type double 
% 

dw = randn (ndimw , 1 )* sqrt ( dt ) ; 
dx = a(x,t)*dw + b(x,t)*dt; 
xn = x + dx ; 
tn = t + dt ; 

return ; 
end 


Example 6.3.- Let us consider the differential equation: 

u" — 7 t 2 u = 0, on (0, 1); u( 0) = 1 , it(l) = e. 

The exact solution is u(x) = exp (x). Using the method above with ns = 
2, 500, we obtain the results shown in Table 6.1. The relative error in mean 
quadratic norm is about 3% in all the simulations. 
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X 

0.1 

■I1HM 

■IIM 

0.75 

0.9 

At = le — 3 

IWEBia 


■Hr* til 

yHI 


Ml I H I U I 

aWBBll 

asm 


aa 


At = le - 4 

iwnfiftl 



BUKfiSO 

vxfxm 

exact 

mmm 




mum. 


Table 6.1. Results for Example 6.3 


6 . 5 . 2 . Elliptic partial differential equations 

Let us consider the PDE (we use Einstein’s convention about the sum of 
repeated indexes 1 < i, j < n): 

<*ij ( x ) + Pi ( x ) + 7 (*) «(*) + / (x) = 0 on fl; 

u = ugpi on dQ. 


Let us assume that: 
1 * 


1 


a = —aa 


&ij — ^ a ik a jk- 


[6.34] 

We consider the multidimensional stochastic diffusion: 
dX t = a (X t ) dW t + b (X t ) dt (6 = /3) ; X (0) = x G Si . [6.35] 

and the stochastic process: 


Y (t) = u (X (t)) exp /7 (X(s))ds . 


We have Y = F X (t) , J 7 (X ( s )) ds J where f:Rx M n x 
verities: 


F ( t , x , z) = u (x) exp (z) . 
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In this case: 

If = u (x) exp (z ) ; |f = 0 ; g = ^ (®) exp {z ) ; 

;i!X, = iXir, (*) ex P (*) > 


so that: 


dlj = ((A t ) i d (W t )j + B t dt) exp / 7 (X(s)) (is 


where: 

= o (X t ) V« (X t ) ; 

Bt = \a lk (X t ) a jk (X t ) ^ (X t ) + b t (X t ) (X t ) + 7 (X t ) « (X t ) • 
In an analogous manner, we consider: 

r = inf {i > 0 : X (t) ^ fl} . 

For t < t, we have Xt € fi, so that, for t < r: 

5t = -/ (X t ) 
and we also have: 

u(x) = £(Z(r)) , 
where: 


Z (r) = (X T ) exp f J 7 (X(t)) dt 


+ / f ( x (i)) exp / 7(X(s))ds dt. 


Thus, u (x) may be approximated by the method presented in the previous 
section. Matlab implementation is performed by modifying the subprograms 
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evaluating a(x), 6(x) a(x), b(x), respectively): here, x is a multidimensional 
vector instead of a real number. Analogously to the preceding situation, let us 
assume that, on the one hand, a subprogram interior(x) returns either the value 
false (or 0) i f x ^ < > or the value true (or 1) if x G < !, and, on the other hand, 
the boundary values arc furnished by a subprogram ub(x). Then, an example 
of Matlab code is: 

Listing 6.4. Deterministic elliptic PDE 

function v = e 11 s t oc_p art (a , b , c , f , in t erior , u_b , xc , ns , d e 1 1 a_t ) 

% 

% determines the solution of the PDE 

% (a_i(x)a _j ( x ) /2 ) *D_ij u) + b_i(x)*D_i u(x) + c(x)*u(x) + f(x 
) = 0 on 
% \ Omega 

% u(x) = u_b(x) on \ p a rt i a I \ Omega 

% at the points of vector xc 

% by performing ns simulations with step delta_t 
% 

% IN: 

% a,b,c,f : coefficients — type anonymous function 
% interior : true , if x is interior ; false otherwise — type 
anonymous function 

% u_b : boundary value of the solution — type anonymous 
fu ncti o n 

% xc : points where the solution has to be evaluated — type 
array of double 

% xc ( : , i ) is a point of evaluation 
% ns : number of simulations — type integer 
% deltat_t : time step — type double 
% 

% OUT: 

% v : estimation of the value — type array of double 
% 

v = zeros (1 , size (xc ,2) ) ; 
aa = @(x , t ) a(x) ; 
bb = @(x, t) b ( x ) ; 
for n = 1 : length ( xc ) 
x = xc ( : , n ) ; 
s = 0; 

for i = 1 : ns 

aux = Z_ vari at e ( aa , bb , c , f , del t a_t , interior ,u_b,x); 
s = s + aux ; 


end ; 



328 Uncertainty Quantification and Stochastic Modeling with Matlab® 


v(n) = s/ns; 

end ; 
return ; 
end 


Example 6.4.- Let us consider Q = {x = (xi, X 2 , xf) € M 3 : Xi > 0 for i = 
1, 2, 3 and x\ + x\ + x\ < 1} and the PDE: 

An = 0, on D; u(x) = x\ — x|/2 — x\/2 on dVt. 

The exact solution is u(x) = x\ — x|/2 — x\/2. We evaluate the solution 
at the points (2 r, r, r) / yf\ 6) for different values of r. Using the method above 
with At = le — 3, we obtain the results shown in Table 6.2. The relative error 
is about 0.4% in all the simulations. 


r 

0.1 

0.25 

0.5 

0.75 

0.9 

ns = 2, 500 

0.0051 

0.0323 

0.1211 

0.2915 

0.39745 

ns = 10, 000 

0.0052 

O 

o 

CO 

to 

0.1239 

0.2806 

0.3966 

ns = 25, 000 

0.0047 

0.0311 

0.1246 

0.2813 

0.4068 

exact 

0.0050 

o 

o 

00 

00 

0.1250 

0.2813 

0.4050 


Table 6.2. Results for Example 6.4 


Results obtained using ns = 2, 500 and different values of At are shown 
in Table 6.3. 


r 

07 

0.25 

0.5 

0.75 

0.9 

At = le - 3 





0.39745 

II 

< 





0.4037 

J exact 





0.4050 


Table 6.3. Results for Example 6.4 


6.5.3. Parabolic partial differential equations 

Let us consider the PDE (we use Einstein’s convention about the sum of 
repeated indexes 1 < i, j < n): 

m (*» *) = <*ij Ou *) dSicj + Pi ( x ’ *) dl 

+7 (x, t ) u {x, t) + f (x, t ) on x (0, T) ; 

u = Udn on 3D; u (x, 0) = uq (x) on D. 
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We assume that [6.34] is satisfied and we consider the multidimensional 
stochastic diffusion: 

dX t = a (X t , S t ) dW t + b (X t , S t ) dt (6 = /3) ; dS t = -dt, [6.36] 
I(0) = iell;5(0)=t [6.37] 

and the stochastic process: 


Y (t) = u ( X (£) , S ( t )) exp / 7 ( X(s ), S (s)) ds 


By setting X = ( X,S ), we have Y = F \t, X (t) , J 7 ds 

\ 0 

where F:lx M n+1 xM — > M verifies 


F (t , x, z) = u ( x ) exp ( z ) . 


In this case, for 


inf jt > 0 : X(t) £ fi x (0,T)} 


we have: 


u(x) = E(Z(t)) 


Z(t) = ugn (X T ,S T )exp\ / 7 (X (t) , S(t)) dt 


+ / / (X(t), S(t)) exp / j(X(s),S(s))ds dt. 
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Thus, we may approximate u (. x,t ) by the method previously introduced. 
We note that: 

r = inf {t > 0 : X (t) ^ ou S (t) £ (0, T)} . 

For Matlab implementation, let us assume that subprograms evaluating 
a(x, t), 6(x, t), 7 (x, t) are available (a(x, t), b(x, t), gama(x, t), respectively). 
The subprogram interior(x, t) returns either the value false (or 0) i f x (ji 1 1 or 
t (f (0, T). It returns the value true (or 1) if x € 11 and t € (0, T). Moreover, 
the boundary values arc furnished by a subprogram ub(x) and the initial 
values by a subprogram u0(x). Then, an example of Matlab code is: 

Listing 6.5. Deterministic parabolic PDE 

function v = for_stoc_part ( a, b,c,f, interior , u_b , u_0 , xtc , ns , 
delta_t ) 

% 

% determines the solution of the PDE 

% D_t u = (a_i(x)a _j ( x ) /2 ) *D_ij it) + b_i(x)*D_i u(x) + c(x)*u( 
x ) + f( x ) on 
% \ Omega 

% u(x) = u_b(x) on \ p a rti a I \ Omega 

% at the points of vector xc 

% by performing ns simulations with step delta_t 
% 

% IN: 

% a,b,c,f : coefficients — type anonymous function 
% interior : true , if x is interior ; false otherwise — type 
anonymous function 

% u_b : boundary value of the solution — type anonymous 
fu ncti o n 

% u_0 : initial value of the solution — type anonymous function 
% xtc : points where the solution has to be evaluated — type 
array of double 

% xtc (1 : ndim ,n) = spatial coordinates of the point of the 
point number n 

% xtc (ndim + 1 ,n) = temporal coordinate of the point number n 

% ns : number of simulations — type integer 
% deltat_t : time step — type double 
% 

% OUT: 

% v : estimation of the value — type array of double 
% 
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v = zeros (1 , size (xtc ,2) ) ; 
ndim = size (xtc ,1) — 1; 
for n = 1: length(xtc) 
x = xtc ( 1 : ndim , n ) ; 
t = xtc ( ndim + 1 , n) ; 
s = 0; 

for i = 1 : ns 

aux = Z_variate(a,b,c,f,delta_t, interior, u_b , u_0 , x , t ) ; 
s = s + aux ; 

end ; 

v(n) = s/ns; 

disp([’v = ’ ,num2str (v(n) ) , ’ , ’ ,num2str( xtc (1 .n) A 2 — 0.5*( 
xtc (2 ,n) A 2 + xtc (3 ,n) A 2) + t ) ] ) ; 

end ; 
return ; 
end 

function v = Z_variate ( a , b , gama , f , dt , in terior , u_boundary , u_0 , 
xa , ta ) 

% 

% furnishes one variate from Z 
% 

x = xa ; 
t = ta ; 

inreg = i n t e r i o r (x , t ) ; 
s_gama = 0; 
s_f = 0; 

ndimw = length (x); 
while inreg 

s_f = s_f + f ( x) *exp ( dt * s_gama ) ; 
s_gama = s_gama + gama(x) ; 

[xn, tn ] = new_point_ito ( a , b , x , t , ndimw , dt ) ; 
inreg = i n t e r i o r (xn , tn ) ; 
x = xn ; 
t = tn ; 

end ; 

if t <= 0 

v = u_0 ( x ) *exp ( dt * s_gama ) + dt*s_f; 

else 

v = u_boundary ( x , t ) *exp ( dt * s_gama ) + dt*s_f; 

end ; 
return ; 
end 

function [xn, tn] = new_point_ito (a , b , x , t , ndimw , dt ) 

% 

% furnishes the new point corresponding to one step of 
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% 

the 

nu 

m eric a l 

simulation 

of Ito ’ 

s d iffu si on 

% 

dX_t 

= a ( X_t 

, t) dW_t + 

b(X_t, 

t ) dt 

% 

DT_t 

= — dt 




% 







% 

IN: 






% 

a : 

the 

fi r s t 

c o effi cient 

- type 

anonymous function 

% 

b : 

the 

second 

c o effi cient 

- type 

anonymous function 

% 

x : 

the 

actual 

point — typ 

e d o ub l 

e 

% 

t : 

the 

actual 

time — type 

double 



% ndimw : dimension of the Wiener process 
% dt : the time step — type double 
% 

% OUT: 

% xn : the new value of X — type array of double 
% tn : the new time — type double 
% 

dw = randn (ndimw , 1 )* sqrt ( dt ) ; 
dx = a(x,t)*dw + b(x,t)*dt; 
xn = x + dx ; 
tn = t - dt ; 

return ; 
end 


Example 6.5.- Let us consider Q = {x = (xi, X 2 , X 3 ) € M 3 : Xi > 0 for i = 
1, 2, 3 and x\ + x| + x§ < 1} and the PDE: 

^ (x, t) = A u + 1, on tt] u(x, t) = x\ — x|/2 — x§/2 + t, on d£ 2; 
u(x. 0) = x\ — x\/2 — x|/2, on fh 

The exact solution is u(x,t ) = x\ — x\/2 — x|/2 + t. We evaluate the 
solution at the points (x = (2r, r, r), t(r)) / y/{6) for different values of r. 
Using the method above with At = le — 3, we obtain the results shown in 
Table 6.4. The relative error is about 0.4% in all the simulations. 


r 






t( r ) 

0.1 

0.2 

0.3 

0.4 

0.5 

ns = 2, 500 

0.1050 

0.2328 

0.4213 

0.6824 

0.9069 

ns = 10 , 000 

0.1045 

0.2331 

0.4254 

0.6820 

0.9042 

ns = 25, 000 

0.1051 

0.2310 

0.4254 

0.6823 

0.9060 

exact 

0.1050 

0.2313 

0.4250 

0.6813 

0.9050 


Table 6.4. Results for Example 6.5 
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Results obtained using ns = 2, 500 and different values of At arc shown 
in Table 6.5. 


r 

0.1 

0.25 

0.5 

0.75 

0.9 

t(r) 

0.1 

0.2 

0.3 

0.4 

0.5 

At = le - 3 

0.1050 

0.2328 

0.4213 

0.6824 

0.9069 

At = le — 4 

0.1046 

0.2332 

0.4285 

0.6838 

0.9058 

exact 

0.1050 

0.2313 

0.4250 

0.6813 

0.9050 


Table 6.5. Results for Example 6.4 


6 . 5 . 4 . Finite difference schemes 

Finite difference approximation schemes may be interpreted as 
probabilistic schemes: let us consider a sequence of random variables 
{ X n } n > 0 such that the law of X n+ \ conditional to X n is independent of n 
and given by: 

f(x | X n = y) = n h (x,y) , 

where h > 0 is a parameter. 

Let { X ( l } L > be the stochastic process obtained by linear interpolation of 
{Xn} n > o with a ste P h: 

X h (nh) = X n ; t. G (nh, (n + 1) h) ==$■ X h (t) = 

X n + ( t -=^)(X n+1 -X n ) . 

Let us consider: 

a ij (x) = ^ J (Vi~ xf} (yj - xf) 7 T h (x, y ) t ( dy ) , 

bi ( x ) = f J (vi ~ x i ) 71-/1 ( x > v) z ( d y) ’ 

||j/— ar||<l 

A h ( u ) (. x ) = ^J ( u (y) - u (x)) 7 r h (x, y) £ (dy ) . 

R" 
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We have (see [DAU 89]) 

Theorem 6.1.- If 

i) a h (x) — > a (x) and b h (x) — > b (x) uniformly on all compact subset 
of M n for h — y 0+; 

ii) { a h (.x')} /( Q and { b h (x) } /f Q arc bounded, independently of x ; 

iii) V e > 0 : sup < j j n h (x, y ) l ( dy ) :i£R" 

[ R n -B e (x) 

0+; 

and then, for h — y 0+, X[ l converges in distribution to X t such that: 

dX t = a (X t ) dW t + b (X t ) dt; X (0) = x 

and 

A h ( u ) (x) — » A ( u ) (x) = ^-E (u (X t ) | Xq = x ) . ■ 

t = o 

Let us consider the equation: 

A u = — / on fl C M 2 ; u = uqq on <9fh 
When using the finite difference discretization scheme defined by: 

*A+ 1 U -)- Ui— V’iJ - (-1 -)- Utj — 1 j, 

h I + hi = ~ fi ’ j ’ 

we have: 

1 1 1 1 nf 1 

fr2 U i+l,j + 1 Jj2 ~ 2 + ^2 J Ui d ~ ~Jh3 ’ 

i.e.: 

Hj-i-i jUj-i-ij T IIj_ ijtXj—ij T Ilj j+iUj j+i T n ? 'j_ | ttij— ] U/.y — hfi,j > 
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where: 

h\h 2 „ h h 

= 2 (hi + hi) ' i+1J = = ^ M+1 = i ’ i_1 = ¥ 2 

We observe that: 

Ili+ij + = 1. 

Let {ei, e 2 } be the canonical basis of M 2 and: 

n h (x, y ) = (x-ei - y) + Ui+i,j6 (x + ei~ y) 

+IIjj_i(5 (x-e 2 -y) + n ij+ i(5 (x + e 2 -y). 


We have: 

A h (u) ( x ) = n i+1 ju (x + ei) + Ilj-ij'u (x — ei) 

+n,;j + iu (x + e 2 ) + Uij-iu (x — e 2 ) ; 

a\j (x) = 0, if i / j; (x) = Hi+ij + n^ij; 
a 22 ( x ) = n.fj+1 + Ily-i; b h = 0 . 

Applying the theorem establishes, we obtain: 

j f E{u{X t ) \X 0 = x) 

so that we may approximate the solution by using X\ l . We have: 

E (■ u (X i+1 ) | Xi)-u (Xt) « hA h (u) (Xi) = hf (Xi) . 

By making the sum of these equalities up to the index n — 1 and taking the 
mean of the result: 


= lim A h < 

h — 5>0+ 


,(s)«f?^an (X n ) + h J2 f (Xi?j 
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6.6. Statistics of dynamical systems 

The analysis of dynamical systems affected by uncertainty is a subject in 
development. In mechanics, for instance, the variability of material parameters, 
geometry, initial conditions or boundary conditions introduces uncertainty in 
the dynamical systems. For instance, a pendulum may be affected by various 
uncertainties concerning its length, mass, rigidity, damping initial position, 
velocity, etc. All these variabilities affect its dynamics and some of them may 
have a significant impact on its natural frequencies and its stability. 

Specific difficulties arise when we arc interested in periodic motions, since 
the usual parameters - such as period, eigenvalues of the linearized system, 
etc. - become random variables. For instance, some eigenvalues may be have 
a sign that depends on the values taken by these parameters, with consequences 
on the stability. In addition, typical curves such as limit cycles, periodic orbits 
and Poincare sections become random objects. 

As in the other situations considered in this text, we arc interested in the 
determination of the probability distributions and statistics of these elements 
and some associated parameters. This kind of analysis involves particular 
difficulties: curves such as limit cycles, orbits, etc., belong to infinite- 
dimensional functional spaces and we need to construct probabilities in these 
spaces. In addition, these probabilities must be operational in the sense that 
they have to be used in order to provide numerical evaluation of means, 
variances and probabilities of events. We may find in the literature a 
traditional procedure furnished by cylindrical measures (see, for example, 
[SCH 69, BAD 70, BAD 74]), but this approach is not operational enough in 
order to furnish numerical methods for the evaluation of the quantities of 
interest. It may be convenient to adopt the point of view of section 1.9, which 
leads to the construction of statistics into a natural manner. It is assumed in 
the following that the reader has the capacity to construct these operational 
infinite-dimensional probabilities: the presented results use the approach of 
section 1.9. 

Let us consider a general dynamical system on M", which is defined by the 
equations: 


^ = f (x,t) Vt <E [O ; T], xel" 
x(0) = x 0 x 0 G M n 


[6.38] 
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A simple example is given by the basic harmonic oscillator formed by a 
spring and a mass (oscillator 1): 

' x\ (t) = X 2 (t) Vt € [O ; T ] 

< X 2 (t) = ~^x\ (t) , [6.39] 

x(0) = Xo random 

where x = (a;i ; X 2 ) t , k is the spring’s stiffness, m is the mass and x\ = 
dxi/dt. Figures 6.7 and 6.8 (see [CRO 10]) show the phase portrait and the 
trajectories for different values of xo with k = 1 and rn = 1. It is interesting 
to note that the limit cycle varies with xq. 


Phase portraft for oscillator 1 with :k = landm = l 



-1.5 -1 -O.S 0 03 1 15 


Figure 6.7. Phase portrait of oscillator 1 for different initial conditions 
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Figure 6.8. Evolution ofx\ ( t ) from oscillator 1 for different initial conditions 


A more complex situation is furnished by the Van der Pol's oscillator 
[DER28], which has been extensively studied in the literature. One of its 
interesting properties is the existence of a limit cycle that is independent of 
the initial conditions. Van der Pol’s equations are given by (oscillator 2): 

x\ = x '2 Vi G [O ; T] 

x : 2 = e(l — x\)x2 — OqXi , [6.40] 

x(0) = x 0 

where e is a parameter measuring the nonlinearity of the system and Qq is a 
positive random variable. influences the properties of the system, as it may 
be observed in the figures below: the limit cycles arc shown in Figure 6.9 and 
the trajectories for x\ arc exhibited in Figure 6.10 - we use e = 1. Table 6.6 
shows the values of the period (see [CRO 10]). 
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Limit cycles in phase space for oscillator 2 with : Epsilon « 1 and x 0 = (1 ; 1 ) 



-2.5 -2 -13 -1 -0.5 0 03 1 13 2 23 


Figure 6.9. Periodic orbits for oscillator 2 

Evolution of x ) for various initial conditions with epsilon = 1 and x Q ■ (1 ; 1) 

2-5 r 



0 5 10 15 20 25 30 


Figure 6.10. Evolution ofx i ( t ) for oscillator 2 
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Qo 

period 

max x\ (t) 

t x ' 

min xi (t) 

t v 7 

max X 2 (t) 

t v 7 

min X 2 (t) 

t 

0.8 

8.58 

2.01 

-2.01 

2.35 

-2.35 

1 

6.66 

2.00 

-2.01 

2.67 

-2.67 

1.2 

5.46 

2.01 

-2.00 

3.01 

-3.00 


Table 6.6. Characteristics of the periodic orbits for different values of Qo 

6 . 6 . 1 . Evaluation of statistics of the limit cycle 

A limit cycle is a periodic orbit (so, a closed curve) so that a limit cycle is 
described by a continuous function y: I —> M n , where I is an interval of M. 
Since we are not interested in temporal aspects, we may, without loss of 
generality, consider I = [0; 1]. As exposed in section 1.9, we consider curves 
belonging to a separable Hilbert space V, possessing a Hilbert basis 
$ = {*/*«} n>l and we use a representation of the limit cycles in this basis: 

Y(w)(-) = 5> .( W ) 0 .(.) [6. 41] 

i> 1 

Thus, the limit cycle is a random variable taking its values on V, which 
may be identified to the sequence {Aj(cu)}j>i of the coefficients of its 
development in the Hilbert basis under consideration. Usually, the mean of a 
random variable X having a law ft and taking values on O is given by: 

E[X] = [ x(uj)dfi(u). [6.42] 

Jn 

Here, ft is defined on V - which is an infinite-dimensional space - and the 
classical definitions given by Riemann or Lebesgue do not apply. The adapted 
tool to handle this situation is furnished by Bochner s functional spaces (see, 
for instance, [BOC 33]), which corresponds to the approach presented in 
section 1.9: initially, the integral is defined for simple functions. Then, it is 
extended to sequences of simple functions and their limits. The theory 
established in this held furnishes conditions that make possible the equality 
(see, for example, [DIE 77]): 

n 

E[y(w)]= lim E 

n— »■ oo ^ J 

li=l J 

n 

i= 1 
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In practice, we use an approximation of Y{oj) by its projection on a finite- 
dimensional space Vjv having dimension N : 

N 

p N Y = ^A i </) i , [6.43] 

1=1 

where pn(-) is the orthogonal projection from V onto Vjy. In this case: 

N 

p N (E[Y(u;)}) = J2^[ A i}^ [6-44] 

i = 1 

Figures 6.11-6.12 show the results obtained by collocation, using a basis 
of linear finite elements and a sample of 20 periodic orbits. For oscillator 1, 
xo = (1; U) where U is uniformly distributed on [0.5 ; 1.5]. For Van der Pol’s 
oscillator, we consider Qq = U (see [CRO 10]). 


Limit cycles in phases space for oscillator 1 
for k=1, m=1 , x '=1 and x uniform on (0.5 ; 1.5) 



-2 - 1.5 -1 -05 0 0.5 1 1.5 2 


Figure 6.11. Mean periodic orbit and sample of periodic orbits 
in the phase space of oscillator 1 
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Limit cycles in phases space for oscillator 2 
for epsilon =1 and omega uniform on (0.5 ; 1.5) 



Figure 6.12. Mean periodic orbit and sample of periodic orbits 
in the phase space of oscillator 2 


In the case of oscillator 1, the exact periodic orbit is: 

V <+\-( cos (*) + u sin (*) ^ 
y — sin(f) + U cos(t)) ) 


[6.45] 


and it is possible to determine the exact mean. We observe that the mean 
furnished by the method agrees with the exact result. 

If we are interested in the distributions of the coefficients {A z (uj) } ? >i, we 
may use the methods previously introduced. Let us define: 


PY = ]T ]T t)^n(-) [6-46] 

n> 1 j > 1 

and look for the deterministic coefficients a n j € M n - which may be 
determined by one of the methods previously presented, such as collocation. 
Figure 6.13 shows the results obtained for Van der Pol’s oscillator by using a 
sample of 40 limit cycles and a Hilbert basis formed by Hermite polynomials, 
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with a second-order approximation and an expansion variable £ = ^ 
uniformly distributed one [0; 1]. The figure shows the 95% confidence 
interval for the periodic orbit (dotted line, results obtained by collocation, see 
[CRO 10]). 


Average limit cycle and confidence interval in phases space for oscillator 2 
for epsilon =1 and omega uniform on (0.5; 1.5) 



Figure 6.13. Mean periodic orbit and its 95% confidence 
interval for oscillator 2 
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Optimization Under Uncertainty 


Optimization looks for the best values of some parameters with regal'd to a 
given cost, under given restrictions. If some of these elements involve 
parameters collected in a vector v, the solution x becomes a function x (y) of 
v. Thus, when stochastic or variable parameters are part of the optimization 
problem, the solutions become stochastic or variable too. 

For instance, let us consider the simple situation where we are looking for 
the real number x that minimizes the function F(x) = (x — v \ )“ under the 
restriction x > V 2 : the obvious solution is 

_ f v\, if v\ > v 2 . _ ( 0, if v\ > v 2 

\ V 2 , otherwise ’ ' \ {v 2 — vi) 2 , otherwise 

If v | and V ‘2 are random vai'iables, the optimal point x and the optimal 
value F(x) are also random vai'iables. For instance, if they vary both on (0, 1), 
x varies on (—1, 1) and F(x) varies on (0, 1). 

An example of catastrophic variations is furnished by the minimization of 
F(x\ , X 2 ) = ((ni - l) 2 +l)xf-4(ni-l)xi.T2+2xi— 2x2+((ni — l) 2 +l)x|. 
In this case, the obvious solution is for v\ / 0, x\ = —\jv\,X 2 = 1/f 2 - 
For v \ = 0, there are infinitely many solutions. If the parameter v\ vai'ies on 
[ — e, e\, the values of x\ may vai'y from —00 to —1/e 2 and those of X 2 from 
1/e 2 to 00 . 

These sample examples show that the variability of the optima has to be 
taken into account in order to ensure security and reliability. In this chapter, 
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we arc interested in the determination of the distribution of the optimal point. 
An alternative approach that looks for a solution satisfying restrictions with a 
given probability is the realibility-based optimization, presented in Chapter 8. 

7.1. Representation of the solutions in unconstrained optimization 

A deterministic unconstrained optimization problem (UOP) is: 
x = arg min {F (y) : y G M n } 

where F : A i(n, 1) — > M is a real-valued function and x = (. Xj ) G A4(n. 1) 
is the unknown. However, in many practical situations, F depends also on 
parameters affected by some uncertainty. In such a situation, the UOP above 
formulated becomes a UOP under uncertainty. For instance, we may consider 
that the uncertainty is represented by an uncertain vector v = (rq, ..., v nr ) and 
that F = F(x, v). Thus, the UOP becomes 

X = arg min {F (Y, v) : Y € R n } [7.1] 

The situation of equation [7.1] may be studied by different techniques, 
such as stochastic programming, reliability-based design optimization and 
fuzzy programming. Stochastic and reliability approaches model the 
uncertainty by using random variables, while the fuzzy approach considers 
uncertain parameters as fuzzy numbers. Usually, stochastic and reliability 
approaches are used when the uncertainty may be properly addressed by 
known random processes or variables, while the fuzzy approach is used in 
situations where the uncertainty cannot be easily quantified. Taking into 
account the presence of v, X becomes uncertain. 

Here, we arc interested in the situation where X is considered a random 
variable and our goal is the numerical determination of its probability 
distribution; this approach is different from the approaches cited above: 
stochastic programming generally involves the minimization of statistics 
characteristics of F - for instance, its mean or variance; reliability-based 
design optimization introduces probabilistic constraints and looks for the 
minimization of F subject to these probabilistic constraints; fuzzy 
optimization produces fuzzy vectors and not probability distributions. We 
observe that the knowledge of the probability distribution of X allows the 
determination of statistics of the solution X and of the optimal value of F. 
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Analogously to the previous situations, we consider an approximation PX 
of X in a convenient subspace S of random variables (see section 3). For 
instance, let F = {<£>fc} fcgN be a convenient family of functions, Nx G N*, £ 
is a convenient random variable, <p (£) = (£) , cp p (£))* G M.{Nx , 1) 

and 


N x / N x \ 

PX = (4) = Xk Fk (0 i.e. (PX), = ]T X jk <Pk (0 . [7-2] 

k= 1 V k = 0 / 

which corresponds to 

s=[{<pi (0, -,¥>** (*)}]" 

= (0 : G Rn ’ 1 ^ k ^ ^ j • C 7 - 3 ] 

As previously, the unknown to be determined is x = (Xij) G A4(n, Nx). 
For instance, equation [7.1] may be approximated as 

PX = arg min {F (Y, v) : Y G S} [7.4] 

and X may be determined by solving equation [7.4] by one of the approaches 
discussed in the following. 


7 . 1 . 1 . Collocation 

Analogously to the preceding situations, we may consider a sample 
X = (Xi, . . . , X ns ) of ns variates from X is available, and solve the system 
of equations: 

PX(£j) = Xj, i = l,...,ns. [7.5] 

This approach is implemented in Matlab by using the programs given in 
section 3.3. 

Example 7.1.- Let us consider the situation where x = (xi, X 2 ) and 
F(x, v) = (xi - V1V2) 2 + (X2 —V\— V2) 2 . 
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In this case: 


f viv 2 \ 

\V 1 +V2 ) 


We consider {v\,v 2 ) as a couple of independent variables, with v l 
uniformly distributed over (a*, 6 , ) . The approximations use 


<Ffc( v ) 


f Vl — Ql V f v 2 - 02 V k 
\bi-aij \b 2 -a 2 J 


sn 1 + r, 


[7.6] 


where n\ > 0, n 2 > 0, 0 < r < ri\, 0 < s < n 2 ). The calculations use 
cii = 0, b\ = 1, a 2 = — 1, 62 = 1, n\ = 3, n 2 = 3. The results are shown in 
Figures 7.1 and 7.2. 



Figure 7.1. Results obtained for X\ in Example 7.1 (ns =6 
equally spaced Vi). For a color version of the figure, 
see www.iste.co.uk/souzadecursi/quantification.zip 


Example 7.2.- Let us consider the situation where x = (xi, x 2 ) and 

F(x, v) = ((2vi + 3)xi + (i>i + 3 )x2 - v if 
+((t>i + l)xi + (vi + 2)x 2 - l) 2 . 

In this case, 

1 f vj + vi-3 \ 
v\ + 3vi + 3 \ ~ v i + v\ + 3 ) 


X = 
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Figure 7.2. Results obtained for X 2 in Example 7.1 (ns =11 
equally spaced Vi). For a color version of the figure, 
see www. iste. co. uk/souzadecursi/quantification.zip 


We consider v\ as uniformly distributed on (—1,1), ns = 11 equally 
spaced values of V{, with v 1 = —1, v ns = 1. The procedure is applied with 
Nx = 5 and a polynomial basis 


Vk(v) = 


v + 1 


The results arc shown in Figure 7.3. 




Figure 7.3. Results obtained in Example 7.2 (numerical quadrature). For a 
color version of the figure, see www.iste.co.uk/souzadecursi/quantification.zip 
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7 . 1 . 2 . Moment matching 

As explained in Chapter 3, a sample may be used in order to determine the 
coefficients such that the approximated moments M a (£) = 
match the empirical moments M e = (Mf, ..., M®) ( moment matching 
method). This may be accomplished either by solving the nonlinear equations 
M a (£) = M e or by minimizing a pseudo-distance d (M a (£), M e ). 

This approach is implemented in Matlab by using the programs given in 
section 3.2. 

Example 7.3.- Let us consider the situation described in Example 7.1 and a 
sample formed by np 2 points ((v\ ( 02 )^ , with, for each r, up values (■ v r ) s 

equally spaced over (a r , b r ) . We consider the empirical moments E ^x\oc^j, 
with i,j < m. Some results arc shown in Figure 7.4. We observe that the 
approach is effective to calculate when £ / v: when the values of v are not 
known, but only the values of x\ and a; 2, we may introduce (£ r ) s equally 
spaced over (a r , b r ) and consider x\ = aq(£), X 2 = X 2 {£). For instance, we 
consider a random sample np 2 points (v 2 )jj, increasingly ordered for 

each component. The equally spaced values of £ are used in order to tit the 
empirical moments. 

Example 7.4.- Let us consider the situation described in Example 7.2 and a 
sample formed by ns = 11 equally spaced values of v t . The results furnished 
by the moment fitting method are shown in Figure 7.5. It is interesting to 
notice that, for Nx = 7, the variables themselves arc correctly approximated 
(Figure 7.6); for the other values of Nx, only the distribution is fitted. 


7 . 1 . 3 . Stochastic optimization 

The unknown coefficients may also be determined by minimizing one of 
the statistics of F. For instance, we may minimize its mean: 

X = argmin{£l (F (Y<p(£), v)) : Y gA 4(n,Nx)}. [7.7] 

This problem may be solved by stochastic optimization methods, such as 
stochastic quasi-gradient methods (see, for instance, [KLE 01]). 
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Approximation with n x = 5, n 2 = 5, m = 4 Approximation with n 2 = 5, n 2 = 5, m = 5 

Figure 7.4. Results obtained in Example 7.3. For a color version of the figure, 
see www.iste.co.uk/souzadecursi/quantification.zip 


Example 7.5.- Let us consider the situation described in Example 7.3. We 
consider the resolution of equation [7.7] with the expectations approximated 
by the empirical means using the sample points. Some results arc shown in 
Figure 7.7. We observe that the method is effective to calculate even for £ f v. 

Example 7.6.- Let us consider the situation described in Example 7.4 and 
the resolution of equation [7.7] with the expectations approximated by the 
empirical means obtained by using the sample points. The results are shown 
in Figure 7.8. The results for £ f v are shown in Figure 7.9. 

7 . 1 . 4 . Adaptation of iterative methods 

The approach used in section 5.1.4 may also be used here. For instance, let 
us consider the iterative solution of equation [7.1] by using iteration function 

A>. 


X(p+i) = q> 



[7.8] 
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Exact CDF 


mean square error = 0.0038635 9 






Approximation with n x = 6, m = 3 


Approximation with n x = 7, m = 4 


Figure 7.5. Results obtained in Example 7.4 (£ uniformly 
distributed, random v). Fora color version of the figure, 
see www.iste.co.uk/souzadecursi/quantification.zip 





Figure 7.6. Results obtained in Example 7.4 (£ uniformly 
distributed, random v). For a color version of the figure, see 
www. iste. co. uk/souzadecursi/ quantification, zip 





Figure 7.7. Results obtained in Example 7.5 (uniformly 
distributed data). For a color version of the figure, 
see www.iste.co.uk/souzadecursi/quantification.zip 


Figure 7.8. Results obtained in Example 7.6 ( uniformly 
distributed data). For a color version of the figure, 
see www.iste.co.uk/souzadecursi/quantification.zip 
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Figure 7.9. Results obtained in Example 7.5 (£ uniformly 
distributed, random v). Fora color version of the figure, 
see www.iste.co.uk/souzadecursi/quantification.zip 


Analogously to section 5.1.4, we may generate a sequence {x ^ p> } p>(l 
starting from an initial guess x (t>> by solving a sequence of linear systems 

AC = B. [7.9] 

where 

A a/ 3 = Arsjk, B a = Brs, C /3 = 

a = ind(r, s), f3 = ind(j, k), [7.10] 

Arsjk = fijrE {ip s (£) fk (£)) , 

Brs = E^p s (£)*r 



[7.11] 
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This approach is implemented in Matlab by a simple modification of the 
code given in section 5.1.4: we modify program iteration_sample in order 
to furnish x instead of Ay;. An example of code is given below: 

Listing 7.1. UQ by adaptation in unconstrained optimization 

function chi = expcoef ( f_iter , chi_ini , vs , phi , nitmax , errmax ) 

% 

% determines the coefficients of the expansion . 

% by using a sample 
% 

% IN: 

% f_iter : Iteration function — type anonymous function 
% chi_ini : initial n x N_X matrix of the coefficients — type 
array of double 

% vi : table of values of v — type array of double 
% vi (: , i ) is a variate from X 

% phi : basis function phi(k,v) — type anonymous function 
% nitmax = max iteration number — type integer 
% errmax = max precision — type double 
% 

% OUT: 

% chi : n x N_X matrix of the coefficients — type array of 
double 

% 

M= fixed_variational_matrix (phi , vs); 

A = tab4 (M, n ,N_X) ; 
chi = chi_ini ; 
nit = 0; 

not_stop = true ; 
while not_stop 

nit = nit + 1 ; 

chi_new = iteration_sample(f_iter,chi,vs,phi,A); 

delta_chi = chi_new — chi ; 

chi = chi_new ; 

err = norm( delta_chi ) ; 

not_stop = nit < nitmax && err > errmax; 

end ; 
return ; 
end 

% 

function w = function_proj (f , chi , vs . phi) 

% 

% maps f( PX(v) , v ) on the sample vi from v 
% f is assumed to have the same dimension as the 
% number of lines of chi 
% 

% IN: 
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% f : the function to be evaluated — type anonymous function 
% chi : n x N_X matrix of the coefficients — type array of 
double 

% vs : table of values of v — type array of double 
% vs ( : , i ) is a variate from v 

% phi : basis function phi(k,v) — type anonymous function 
% 

% OUT: 

% w : n x ns table of values of f — type array of double 
% 

ns = size (vs, 1); 
n = size ( chi , 1 ) ; 

PXs = proj e c t i on ( chi , vs , phi ) ; 
w = zeros (n , ns ) ; 
for i = 1 : ns 

w ( : , i ) = f ( PXs (: ,i),vs(: , i ) ) ; 

end ; 
return ; 
end 

% 

function A = tab4 (M, n ,N_X) 

% 

% generates the table A from the table of 
% scalar products of the basis functions 
% M_ij = E( phi _i ( v ) phi _j ( v ) ) 

% 

% IN: 

% M: N_X x N_X table of scalar products — type array of double 
% n: number of unknowns (length of X) — type integer 
% N_X : order of the expansion — type integer 
% 

% OUT: 

% A = nN_X x nN_X table — type array of double 
% contains A( alpha , beta) 

% 

aaaa = zeros ( n , N_X, n ,N_X) ; 
for r = 1: n 

for s = 1: N_X 
for j = 1: n 

for k=s :N_X 
if r == j 

aux = M( s , k) ; 

aaaa ( r , s , j , k) = aux; 

aaaa ( j , k , r , s ) = aux ; 

end ; 

end ; 


end ; 
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end ; 

end ; 

nn = n*N_X; 

A = zeros ( nn , nn ) ; 
for r = 1: n 

for s = 1: N_X 

alffa = index_map ( r , s ) ; 
for j = 1: n 

for k = 1: N_X 

betta = index_map ( j , k , n ,N_X) ; 
A(alffa,betta) = aaaa(r,s,j,k); 

end ; 

end ; 

end ; 

end ; 
return ; 
end 


% 

function B = tab2 (N, n ,N_X) 

% 

% generates the table B from table N 
% N_rs = E( phi_s(v)f_r(PX(v) ) ) 

% 

% IN: 

% N: N_X x n table of scalar products — type array of double 
% n: number of unknowns (length of X) — type integer 
% N_X : order of the expansion — type integer 
% 

% OUT: 

% B = nN_X x 1 table — type array of double 
% contains B( alpha) 

% 

nn = n*N_X; 

B = zeros ( nn , nn ) ; 
for r = 1 : n 

for s = 1: N_X 

alffa = index_map ( r , s , n ,N_X) ; 

B( alffa ) = N(r , s) ; 

end ; 

end ; 
return ; 
end 

% 

function M= fixed_variational_matrix (phi , vs) 

% 

% generates the matrix M such that 
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% M_ij = E( phi _i ( v ) phi _j (v ) ) 

% 

% IN: 

% phi : basis function phi(k,v) — type anonymous function 
% vi : table of values of v — type array of double 
% vs(: , i ) is a variate from X 
% 

% OUT: 

% M: N_X x N_X table of scalar products — type array of double 
% 

M= zeros (N_X, N_X) ; 
for i = 1: N_X 

fl = @(U) phi (i ,U) ; 

Y = map( f 1 , vs , 1 ) ; 

A(i,i) = scalprod (Y, Y) ; 
for j = i + 1 : N_X 

fl = @(U) phi ( j ,U) ; 

Z = map( fl , vs , 1 ) ; 

aux = scalprod (Y,Z) ; 

M( i , j ) = aux ; 

M( j , i ) = aux ; 

end ; 

end ; 
return ; 
end 

% 

function N = iteration_variational_matrix ( phi , f, chi, vs,n,N_X 

) 

% 

% generates the matrix N such that 
% N_rs = E(phi_s(v)f_r(X(v))) 

% assumes that f furnishes a vector of length n 
% and the number of lines of Xs is also n 
% 

% IN: 

% phi : basis function phi(k,v) — type anonymous function 
% f : Iteration function — type anonymous function 
% chi : n x N_X matrix of the coefficients — type array of 
double 

% vs : table of values of v — type array of double 
% vi(; , i ) is a variate from X 

% n: number of unknowns (length of X) — type integer 
% N_X : order of the expansion — type integer 
% 

% OUT: 

% N: N_X x n table of scalar products — type array of double 
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N = z e r o s ( N_X , n ) ; 
w = function_proj (f , chi , vs , phi ) ; 
for r = 1 : n 

Y = w(r , ; 
for s = 1 :N_X 

fl = @(U) phi ( s ,U) ; 

Z = map( fl , vs , 1 ) ; 

aux = scalprod (Y,Z) ; 

N(r , s ) = aux ; 

end ; 

end ; 
return ; 
end 

% 

function chi = iteration_sample ( f_iter , chi_old , vs , phi ,A) 

% 

% evaluates the new coefficients 
% 

% IN: 

% f_iter : Iteration function — type anonymous function 
% chi_old : n x N_X matrix of the coefficients — type array of 
double 

% vs ; table of values of v — type array of double 
% vsf. 4 , i) is a variate from X 

% phi : basis function phi(k,v) — type anonymous function 
% A = nN_X x nN_X table — type array of double 
% contains A( alpha , beta) 

% 

% OUT: 

% delta_chi : n x N_X matrix of the coefficients — type array 
of double 

% chi = chi_old + delta _chi 
% 

n = size ( chi_old , 1 ) ; 

N_X = size(chi_old,2); 

N = iteration_variational_matrix(phi, f_i ter , chi_old, vs , n , N_X 

); 

B = tab2 (N, n ,N_X) ; 

C = A \B ; 

chi = zeros ( size ( chi_old )) ; 
for r = 1 : n 

for s = 1: N_X 

alffa = index_map ( r , s , n ,N_X) ; 
chi (r , s) = C( alffa ) ; 

end ; 

end ; 
return ; 
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end 

% 

function v = index_map ( i , j , n ,N_X) 
v = i + (j — l)*n ; 

return ; 
end 


Example 7.7.- Let us consider the situation described in Example 7.3. We 
consider gradient descent using a fixed step fi = 0.1, with a starting point 
determined by the intrinsic Nelder-Meade method furnished by Matlab. As 
in the previous situations, the empirical means are approximately evaluated 
by using sample points forming a uniform grid on the region, as described in 
Example 7.3. The results obtained for np = 11 arc shown in Figure 7.10. 




Exact component 2 Approximation with ni= 4, n 2 = 4, np = ll 


Figure 7.10. Results obtained for the variables in Example 7.7 (uniformly 
distributed data). For a color version of the figure, 
see www.iste.co.uk/souzadecursi/quantification.zip 

Example 7.8.- We consider again the situation described in Example 7.4. 
As in the previous example, we consider gradient descent using a fixed step 
H = 0.1, with a starling point determined by the intrinsic Nelder-Meade 
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method furnished by Matlab, the empirical means evaluated by using the 
sample of ns = 11 equally spaced points, as described in Example 7.4. The 
results arc shown in Figure 7.12. 
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7.1.5. Optimal criteria 

In some situations, namely for a convex objective function F, optimal 
criteria may be established under the form of systems of algebraical equations 
and the methods exposed in section 5.1 may be used. For instance, we may 
consider the equations 

VF (x, v) = 0, [7.12] 

which arc satisfied by the optimal point. The methods presented in section 
5.1 may be applied in order to determine an approximation PX, as described 
above. Namely, the Matlab implementation is identical. 

Example 7.9.- We consider the situation described in Example 7.3. The 
resolution of equation [7.12] is performed by the variational approach, with 
the expectations approximated by the empirical means using the sample 
points. The results obtained for the variables themselves arc shown in 
Figure 7.13. A comparison of the cumulative functions is furnished in 
Figure 7. 14. 



Exact component 2 Approximation with n t = 4, n 2 = 4, np = 11 


Figure 7.13. Results obtained for the variables in Example 7.9 (uniformly 
distributed data). For a color version of the figure, see 
www. iste. co. uk/souzadecursi/ 'quantification, zip 
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Exact CDF 


Figure 7.14. Results obtained for the CDF in Example 7.10 (uniformly 
distributed data). For a color version of the figure, 
see www.iste.co.uk/souzadecursi/quantification.zip 


Example 7. 10.- We consider now the situation described in Example 7.4. 
The resolution of equation [7.12] is performed again by the variational 
approach, with the expectations approximated by the empirical means using 
the ns = 11 sample points. The results arc shown in Figure 7.15. 



Xi, n = 7 X 2 , n = 7 


Figure 7.15. Results obtained in Example 7.10 (uniformly 
distributed data). For a color version of the figure, 
see www.iste.co.uk/souzadecursi/quantification.zip 



364 Uncertainty Quantification and Stochastic Modeling with Matlab® 


7.2. Stochastic methods in deterministic continuous optimization 

This section presents stochastic numerical algorithms for the determination 
of the points of global minima of deterministic regular functions, i.e. for the 
global optimization of continuous functions. 

This area is a subject of extensive research, and the readers may find in the 
literature a wide variety of approaches which prevents any tentative of 
exhaustive presentation and condemns all texts to partial presentation - the 
present one is not an exception and it is possible that the readers will not find 
below their preferred algorithm. 

Our focus is on the construction of numerical methods: but more than the 
simple description of algorithms, this chapter aims to give the readers the 
basic principles for the construction of stochastic algorithms - which are 
based on the fundamental theorems presented in the following. The examples 
presented must be considered as illustrations of the methodology: for a 
specific situation, the readers must consider the probable existence of adapted 
algorithms, specifically conceived for the situation under analysis and 
exploiting particularities of the problem in order to increase the efficiency - 
these algorithms may be used as seed algorithms and combined to those 
presented by using the procedures described in the following. 

The theorems presented may be interpreted as follows: generating random 
points with a strictly positive probability for any region of the working space 
leads to a strictly positive probability of obtaining a random point in a given 
neighborhood of a point of global minimum. In addition, the probability of 
generating a random point in such a neighborhood on a large number of draws 
increases (namely, it converges to one for an infinite number of trials). 

From these observations, a first, and simple, idea results for the 
minimization of a regular function / : M /,: — y M. We may, for instance, 
randomly generate nr points { a;, : 1 < i < nr} and estimate the value m 
of the minimum by using the values {/ (xj) : 1 < i < nr}. In practice, it 
may be more convenient to consider the generation of trial points in a 
“controlled” neighborhood of a point: we generate the point Xi + 1 by means of 
a displacement Ax* of the actual point x t . In this case, we obtain an algorithm 
which is as follows: 
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1) Initialization : it is given a starting point xq and a maximum iteration 
number nm. Set iteration number to zero: n < — 0; 

2) Iterations : the actual point is x n and we determine x n+ \ in two substeps 

- drawing : randomly generate an increment Ax n 

- dynamics : determine x n+ \ from x n and t n = x n + Ax n . For instance, 
use an elitist dynamics given by 

x n +i = arg min {/ (x n ) , f (t n )} ■ 


3) Stopping test : if n < nm , , then increment n : n < — n + 1 and return to 
step 2. Otherwise, stop iterations and estimate m ~ / (x n ) and x* ~ x n . 


The epithet elitist is justified by the systematic rejection of the elements 
that do not improve the value of /. We have 

/ X n , if / On) < / ( t n ) 

Xn+1 \ tn, if / On) > / (tn) 

Usually, this dynamics is written under the synthetic form 

Xn+l — D n X n T (1 Dn) t n , 


where 

P (. D n = 0|/ On) < / (tn) ) = 0; P (D n = 1 1/ On) < / (*n) ) = 1 

P (L» n = 0|/ On) > / (tn) ) = U P (D n = 1 \f (x n ) > f (t n ) ) = 0 

By assuming that a subprogram alea(n,nt); furnishes an n x rit table of 
random values, Matlab implementation reads as: 

Listing 7.2. Stochastic descent 

function [xsol, fsol] = stoc_desc (xO , ntmax , ncmax , alea) 

% 

% performs nc cycles involving ntmax trial points 
% of stochastic descent starting from xO 
% assumes that xO is n x 1 
% 



366 Uncertainty Quantification and Stochastic Modeling with Matlab® 


%IN: 

% xO : n x 1 vector — type array of double 

% ntmax : maximum number of trial points by cycle — type integer 
% ncmax : maximum number of cycles — type integer 
% alea: generator of random points — type anonymous function 
% alea(n,nt) furnishes a n x nt table of random values 

% 

dynamics = @ eliti s t_dy namic s ; 

[xsol, fsol ] = stoc_iterations (xO , ntmax , ncmax , alea, dynamics); 

return ; 
end 

function [xsol, fsol] = s toc_i ter at ion s (xO , ntmax , ncmax , alea, 
dynamics ) 

% 

% performs nc cycles involving ntmax trial points 
% starting from xO 
% assumes that x 0 is n x 1 
% 

%IN: 

% xO : n x 1 vector — type array of double 

% ntmax: maximum number of trial points by cycle — type integer 
% ncmax: maximum number of cycles — type integer 
% alea: generator of random points — type anonymous function 
% alea(n,nt) furnishes a n x nt table of random values 
% dynamics : the dynamics to be used — type anonymous function 
% 

xac = xO ; 
fac = f ( xO ) ; 
for nc = 1 : ncmax 
xc = xac ; 
fc = fac ; 

for nit = 1: ntmax 

xt = t r i a 1 _p o i n t ( xc , 1 , alea); 
ft = f ( xt ) ; 

[xc, fc] = dynamics] xc , fc , xt , ft, nit); 

end ; 

xac = xc ; 
fac = fc ; 

end ; 

xsol = fac ; 
fsol = fac ; 

return ; 
end 

function xt = trial_point (xac , nt , alea) 

% 

% generates nt trial points by perturbation of 
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% the actual point xac 
% assumes that xac is n x 1 
% 

%IN: 

% xac: n x 1 vector — type array of double 
% lit: number of trial points — type integer 

% alea: generator of randon points — type anonymous function 
% alea(n,nt) furnishes a n x nt table of random values 
% 

n = length ( xac ) ; 
xt = zeros (n , nt ) ; 
dx = alea(n.nt); 
for i = 1 : nt 

xt(:,i) = xac + dx ( : , i ) ; 

end ; 
return ; 
end 

function [xd, fd] = elitist_dynamics(xac,fac,xt,ft,nit) 

% 

% applies the dynamics by selection of the best point 
% 

if ft < fac 



xd = 

xt ; 


fd = 

ft ; 

else 




xd = 

xac 


fd = 

fac 

end ; 




return ; 
end 


This simple algorithm is usually referred to as stochastic descent method. 
It is concerned with a convergence theorem with quite general assumptions, 
which include non-convex objective functions /: 

inf j/ (x) : x G j = m € M ; [7.13] 

V A > m : S\ = {x € R fc : / (x) < A} 

is bounded, with non-empty interior; 

V A > m : SI = jx € : / (x) > A j has a non-empty interior; 


[7.15] 
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Assumptions [7.14] and [7.15] arc satisfied, for instance, when / is 
continuous and coercive or when / is continuous, and we arc interested in the 
minimum on a bounded subset. Assumption [7.15] guarantees that 
P (x n ^ S\) > 0 and may be weakened by considering the values 

A € (m, m + e) for which this assumption is satisfied. However, the 

convergence may be slow, even extremely slow, in large dimensions. In 
practice, the efficiency of a method purely based on generation of trial points 
is poor, even extremely poor, in large dimensions. Thus, modifications of this 
basic method have been introduced and have led to other versions of the 
fundamental theorem. For instance, the use of the dynamics of Metropolis 
instead of the elitist one leads to the algorithm of simulated annealing, which 
is based on version 2 of the fundamental theorem. In the dynamics of 
Metropolis, 

P(D n = 0 f(x n ) < f(t n )) = c n ; 

P(D n = 1 f{x n ) < f(t n )) = 1 - Cn, ; 

P (. D n = 0 f (x n ) > f (t n ) ) = 1; P (. D n = 1 f (x n ) > f (t n ) ) = 0; 

( ( f (■ t n ) - / (x n ) + a \\ 

Cn — exp [— [ Tn +0)). 


where a > 0, (3 > 0 and {0 n } n g N is a sequence of strictly positive real 
numbers such that 6 n — > 0 “slowly enough” (in general, as l/y^og (n)). In 
this case, 


'J'n+l — 


x n , with probability 1 — c n 
t n , with probability c n 


, if / (x„) < f {t n ) 


t n , if / {x n ) > f ( t n ) 
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Matlab implementation consists of a small modification of the preceding 
code: 

Listing 7.3. Stochastic descent with Metropolis ’ dynamics 

function [xsol, fsol] = s toc_desc (xO , ntmax . ncmax , alea , a , b , c , 
aalfa , bbeta ) 

% 

% performs nc cycles involving ntmax trial points 
% of stochastic descent starting from xO 
% assumes that xO is n x 1 
% 

%IN: 

% xO : n x 1 vector — type array of double 

% ntmax: maximum number of trial points by cycle — type integer 
% ncmax: maximum number of cycles — type integer 
% alea: generator of random points — type anonymous function 
% alea(n,nt) furnishes a n x nt table of random values 
% 

teta = @(nit) te t a_f ading ( a , b , c , n i t ) ; 

dynamics = @(xc , fc , xt , ft , nit ) metropolis_dynamics (xc , fc , xt . ft , 
aalfa , bbeta , nit , teta ) ; 

[xsol, fsol] = s t o c _i t e r a t i o n s ( xO . ntmax , ncmax , alea, dynamics); 

return ; 
end 

function [xd, fd] = metropolis_dynamics(xac,fac,xt,ft,a,b,nit, 
teta ) 

% 

% applies the dynamics of Metropolis 
% 

if ft < fac 
xd = xt ; 
fd = ft ; 

else 

teta_n = teta(nit); 

aux = — (b + (ft — fac + a)/teta_n); 
c = exp ( aux ) ; 
test = rand () ; 
if test < c 
xd = xt ; 
fd = ft ; 

else 

xd = xac ; 
fd = fac ; 

end ; 

end ; 
return ; 
end 
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We may also consider a hybrid approach by a combination of deterministic 
descent methods and the stochastic descent methods based on version 3 of the 
fundamental theorem. 


7 . 2 . 1 . Version 1: Stochastic descent 

In this section, we present the fundamental result that justifies the stochastic 
descent method. 

Theorem 7.1.- Let {U n } n G f : be a sequence of random variables such that 

V n > 0 : U n +\ < U n a.s. 

V n > 0 : U n > m a.s. 

V A > m : 3 a (A) > 0 such that P (U n+ \ < A \U n > A) > a (A) . [7.16] 

Then: 

U n — > m a.s. ■ 

We remark that condition [7.16] implies that P (U n > A) > 0. In the 
following, we establish that this inequality is satisfied when the probability 
density of the trial points drawn is strictly positive on M n . 

The proof of the theorem uses the following lemmas: 

Lemma 7.1.- Let {E n } n g N c 0. be a family of almost sure events. Then, 
E = P| E n is almost sure. ■ 

n G N 

PROOF.- Let E' n = 12 — E n be the complement of the event E n . We have 


V n > 0 : P ( E fj = 1 - P (E n ) = 1-1 = 0. 


Let E c = Vl — E be the complement of the event E. We have 




Optimization Under Uncertainty 371 


so that 

( \ +oo +oo 

U E n < E P (K) = E 0 = 0 

nSN / n = 0 n = 0 

and P (E) = 1 - P ( E c ) = 1 - 0 = 1. ■ 

Lemma 7.2.- Let {U n } n g N be a sequence of random variables such that 
V n > 0 : U n+ \ < U n a.s. 

V n > 0 : ?7 n > m a.s. 

then there exists a random variable Uoo such that U n — > Uoo a.s. and U 0 0 > 
m a.s. In addition, U n > U Q 0 a.s. for any n > 0. ■ 

Proof.- 

1) Let us consider 

A n = {c o G : U n+ i (t o) < U n (w)} , 

B n = {uj G 12 : U n ( i lj ) > m .} . 

By denoting A c n = Q — A„, /i'; = O — B n (complements of A n 
and B n , respectively), we have P (A' ( ) = P { IV n ) = 0. Let us introduce 
E n = A n n B n . Then, E r n = (A n n B n ) c = A„ U B r n , so that 

P (. E c n ) = P (A c n U 5^) < P (A c n ) + P (P^) = 0 + 0 = 0. 

and P n is almost sure, since P (E n ) = 1 — P (E°) = 1 — 0 = 1. Let us 
consider 

E = {a; G fl : in < U n+ 1 (at) < U n (ui) , V n > 0} . 

Then, E = (^| E n , and it yields from 7. 1 that E is almost sure. 

n G N 
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2) Let co € E. Then, {U n (w)} n g N C M is decreasing and bounded, so that 
there exists U a 0 (to) such that 

U n (co) — > Uoo (co) for n — > +oo. 

In this way, we define a numerical function Uoo '■ -2 — > M. The basic 
properties of measurable applications (see, for instance, [HAL 64], p. 184) 
show that Uoo is a random variable. In addition, Uoo (w) > m, since 

m<U n (c o ) , V n > 0. 

3) Let 

F = {co € Q : U n (co) — > U 0 o (w) for n — > +oo } . 

We have E C F, so that P (F) > P (E) = 1 and F is almost sure. Let 
G = {cu € H : Uoo (w) > m } . 

We have E C G, so that P ( G ) > P (E) = 1 and G is almost sure. 

4) Let us consider 

H n = {u; G 0 : U n (co) > (/qq (to) } 

Let co £ E. Since { U n (cu)} n g N C M is decreasing and bounded, we have 
Uoo (w) < U n (co) , V?r > 0. Thus, E C H n , so that P (H n ) > P (E) = 1. So, 
H n is almost sure for any n > 0. ■ 

Lemma 7.3.- Let U Q 0 be a random variable such that 


V A > m : (Too < A a.s., 

[7.17] 

(7oo > m a.s. 

[7.18] 

Then, Uoo = m a.s. ■ 


Proof.- 



1) Let us introduce 

A = {co G : Uoo (w) = m } 
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B = {uj € 12 : U a o (te) > m } 

Assumption [7.18] shows that: 

P(AuB) = 1. 

Since 

An B = 0 , 

we have 

P(AUB) = P(A) + P(B). 

So, 

P(A) + P(B) = 1. 

2) Let us consider 

B n = jw € 12 : Uoo (u) > m + ^ | . 

We have 

uj € B = =$■ 3 n > 0 such that Uqo (w) > m + — =t> cc € I I B n . 

n w 

n e N 

Thus, on the one hand, 

BC [J B n . 

n G N 

On the other hand, 

.B n C B,Vti>0 (J B n C B, 
n E N 
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so that 

B = U B n . 

ne N 

3) The first assumption of Lemma (equation [7.17]) yields that 
B n is almost impossible, Vn > 0. 

Thus: 

P(B) = p( IJ B r ) < P{B n ) = 0 

\n £ N / n = 0 

and B is almost impossible. So, 

P (A) = 1- P (B) = 1 — 0 = 1 
and A is almost sure.D 

Lemma 7.4.- Under the assumptions of Theorem 7.1, we have 
VA > m : P (U n > A) — > 0 quando n — > +oo. ■ 

PROOF.- Let us consider the random variable Z n given by 
Z n = 0, if U n > A, Z n = 1, if U n < A. 

We have 

P (Z n+ 1 = i) = P (Z n . |_i = i, Z n = 0) + P (Z n+ 1 = i, Z n = 1) , 

so that 

P (Z n+ i = i) = P (Z n = 0) P (Z n+ 1 =i \Z n = 0) 

+P ( Z n = 1) P (Z n + 1 = i | Z n = 1) 

and, by taking 


7T, 


P(Zn = 0) 
P(Z n = 1) 
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we obtain 

7i"n+ 1 — At T n , Aqj — P {Z n - |_i — i | Z n — j ) (0<i,j < 1). 

Since the sequence is decreasing a.s., 

^4oi = P {Z n + 1 = 0 \Z n = 1) = P (JJ n +\ > A \U n < A) = 0. 

Thus, from the assumptions of the theorem, 
tIoo = P {Z n +i = 0 \Z n = 0) = 1 — P (U n +i <A \U n > X) < 1 — a (A) . 
Thus: 

V n > 0 . (7T n _|_i)g — Aoo (TTn ) q T ^01 {^n)i 

= Aoo (7T n ) 0 < (1 - a (A)) (7T n ) 0 . 

This equation of recurrence shows that 

V n > 0 : (7T n+ i) 0 < (1 - a (A)) n+1 (tt 0 ) 0 . 

Consequently, given that a (A) > 0, 

(7r n+ i) 0 — > 0 for n — > +oo . ■ 

Lemma 7.5.- Under the assumptions of Theorem 7.1, there exists U 0 0 such 
that U n — > U. In addition, 

VA > m: P (Uoo > A) = 0. ■ 

Proof.- 

1) Lemma 7.2 shows that there exists U a 0 such that U n — > U a.s. Since 
U n — > Uoo a.s., we have also U n — > Uoo P- and, as a result, U n — > U D. 

2) Let F n be the distribution of U n and be the distribution of Uoo ■ The 
convergence in distribution implies that 

P {U n < A) = F n (A) — > Foo (A) = P {Uoo < A) , 
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so that 


p (Un > A) = 1 - F n (A) — > 1 - Foe (A) = P (C/oo > A) . 

Thus: 

P (Un > A) = lim P (U n > A) = 0 . □ 


Proof of the theorem 7.1- 

1) It yields from Lemma 7.2 that there exists a random variable U a 0 such 
that U n — > Uoo a.s. and Uoo > m a.s. As in the proof of these lemmas, we 
consider 

E = {uj G Q : m < U n+ 1 (u) < U n (ui ) , V n > 0} . 

E is almost sure and U n (to) — > Uoo (w), for any ui € E. 

2) Let A > m: it follows from Lemma 7.5 that U Q 0 < A a.s. Consequently, 
it follows from Lemma 7.3 that Uoo = r >> a.s. □ 

Note 7.1.- An alternative proof that U 0 o = m a.s. is the following: 

- Let A > m and consider 

A n (A) = {ui € E : U n ( ui ) < A} . 

We have 

A n (A) C A n+1 (A) , V n > 0. 

- Let 

A (A) = {u € E : Uoo (w) < A} . 

Since Uoo (w) < U n (t u) (Lemma 7.2 ), we have 
c o c A n (A) y Uoo (w) ^ U n (w) < A y lu £ A (A) . 
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Thus: 

A n (A)ci(A),Vn> 0. 

So, by taking p n (A) = P (A n (A)), we have 
P {A (A)) > Pn (A) , V n > 0. [7.19] 

- Denoting A£ (A) = D — A n (A) and P n (A) = A n+ \ (A) n A r n (A), we 
have 

A n (A) n B n (A) = 0 . 

In addition: 

A,+t (A) = A n+1 (A) n n = A n+1 (A) n (A n (A) u A c n (A)) 

= (A n+ 1 (A) n An (A)) U {An+1 (A)n^ (A)) , 

= An(A) =B„(A) 

so that 

A n +i (A) = A n (A) U B n (A) 

and 

P (A n+ 1 (A)) = P (,4 n (A)) + P (P n (A)) . [7.20] 

- We have 

V n > 0 : P (A n+1 (A) |A£ (A) ) = P (U n+1 < A |P„ > A) > a (A) > 0, 

so that, for any n >0: on the one hand, p n (A) > 0 and, on the other hand, 

P (B n (A)) = P (A£ (A)) P (A n+1 (A) \A* (A) ) = 

(1 - Pn (A ))P (A n+1 (A) | A- (A)) > (1 - Pn (A))a (A) . 

- Thus, equation [7.20] shows that 

V n > 0 : p n+ 1 (A) > p n (A) + (1 - p n (A)) a (A) . 



378 Uncertainty Quantification and Stochastic Modeling with Matlab® 


By setting q n (A) = 1 — p n (A), we have 

V n > 0 : q n+ i (A) < (1 - a (A ))q n (A) . 

This equation of recurrence shows that 

V n > 0 : q n (A) < (1 - a (A )) n q 0 (A) . 

Consequently, given that a (A) > 0, 

q n (A) — > 0 e p n (A) — > 1 for n — > +oo. 

- So, equation [7.19] shows that P {A (A)) = 1 and, as a result, U a 0 < A 
a.s. 

- It follows from Lemma 7.3 that U Q 0 = m a.s. □ 

This theorem suggests a method for the minimization of a continuous 
function /: let us consider the following: 

1 ) Z : Q — y R k a continuous random vector having a probability density 
p z : R k — y M such that (|«| is the Euclidean norm) 

V r > 0 : inf {( p z (z) : \z\ < r} > a (r) > 0 

2) .to : — > R k a given random vector. 

Let us define a sequence {x n } n g N , x n : Q — > R k of random vectors as 
follows: 

V n > 0 : x n+ i (uj) = arg min {/ (x n (w)) , / (t n (a;))} , t n = x n + Z\ 
and a second sequence {U n } n g N , U n : Q — y M given by 

U n M = / (x n (u)) . 

We observe that the conditional cumulative function of t n is 

ip n (t \x n = x) = <p Z (t - x) > 0 
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The conditional density of probability of t n is 

ip n {t\x n =X)=V Z (t-x) 


and the density of probability of x n is 





(p z (t — x) dx. 


Thus: 

V r > 0 : (j) n (t) > J <p z (t — x) dx > a ( r ) i ( B r ) > 0, 

\t— x\<r 

where i is the Lebesgue’s measure and B r is the ball of radius r 

{ B r = {z € M n : \z\ < ?’}). 

We have 

Theorem 7.2.- U n — » m a.s. ■ 

Proof.- 

1) By construction: 

V n > 0 : TJn+i < U n a.s. 
and 


V n > 0 : U n > m a.s. 
2) So, on the one hand. 


J Vn (t\x n 

S\ 

J Vz(t- 


= x)dt> / ip n (t\x r 
B{ A) 

x) dt > r a (rjx) dt 


B( A) 


B( A) 


x) dt = 
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and (t is the Lebesgue’s measure) 

J Vn (t \x n = x) dt > a (r/ A ) i (B (A)) 

5a 

3) On the other hand, 

P (x n +i G 5a, X n £ S\) = J n n (i dx ) J (p n (t \x n = x ) dt 

> a (ri x ) £ (B (A)) J n n (dx ) , 

5 ! 

where F n is the distribution of x n and fi n is the measure associated with F n . 
Thus: 

P (x n+ i G S\, x n <£ S\) > a (rjx) £ (B (A)) P (x n £ S\) . 

From assumption [7.15], there exist y\ € S' x and r\> 0 such that 

C (A) = jx G : |x — y| < ta| C S x . 

So, 

P (x n £ S\) = f (j) n (x) dx > f (j) n (x) dx > a (r\) £ (B rx ) > 0 
5 a C( A) 

and, consequently, 

P (x n+l G 5a \x n £S X ) = P{Xn p { l^ X s n / Sx) > a (dx) t ( B (A)) . 

4) By setting 

a (A) = a(rjx) £(B( A)), 

we have 


P(U n + i<A \U n > A) > cr (A) > 0. 
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5) The result follows from Theorem 7.1. ■ 

As previously mentioned, the algorithm associated with this theorem is the 
stochastic descent method: 

1) Initialization : It is given a starting point xq and a maximum iteration 
number nm. Set iteration number to zero: n < — 0. 

2) Iterations : the actual point is x n and we determine x n+ \ by using two 
substeps 

- drawing: we generate a variate Z n from Z and we set t n = x n + Z n . 

- dynamics : determine x n+ \ as follows: 

_ j x n , if / (x n ) < f (t n ) \ 

Xn+1 \ tn, if f {x n ) > f (tn) J ' 

3) Stopping test: if n < nm . , then increment n : n < — n + 1 and return to 
step 2. Otherwise, stop iterations and estimate rn ~ / (x n ) and x* ~ x n . 


7 . 2 . 2 . Version 2: Dynamics of Metropolis 

As previously observed, the efficiency of the stochastic descent is poor. As 
a result, we may find in the literature modifications of the basic stochastic 
descent tending to improve its efficiency. For instance, a first idea comes from 
the observation that the elitist dynamics rarely modifies the actual point: the 
trial t n is rejected, except when it corresponds to an improvement in the value 
of the objective function /. Thus, in practice, the elitist dynamics confines to 
exploration of neighborhoods of the actual point by a large number of trials. 
These observations suggest that an improvement may consist of using a 
dynamics leading to a larger exploration of the working space, by accepting a 
controlled degradation of the objective function. An example of such a 
modification is furnished by the dynamics of Metropolis, which consists of 
accepting a degradation of the values of / with a probability that rapidly 
decreases (for instance, exponentially decreases) with the value of the 
degradation. The convergence of the resulting method of optimization is 
based on the following theorem: 
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Theorem 7.3.- Let {U n } n g N be a sequence of random variables such that 

+00 

V n > 0 : P (U n+ 1 > U n ) < (3 n com 22 [3 n < oo 

n = 0 

V ro > 0 : U n > m a.s. 

V A > m : 3 a (A) > 0 such that P (U n+ \ < A \U n > A) > a (A) . [7.21] 

Then: 

U n — > m a.s. ■ 

As previously observed, Condition [7.21 ] implies that P (U n > A) > 0. As 
in the preceding situation, this inequality is satisfied by random generation of 
trial points such that the probability density is strictly positive everywhere on 

M n . 

The proof uses the following lemmas: 

Lemma 7.6.- Let {E n } n g N c 12 be a family of events such that P (Ef) < 

+oo 

/ 3 n , with ^2 (3 n < oo. If E C 12 is an event such that 

n = 0 

+oo 

V n > 0 : Pi Ei C E 

i — n 

then E is almost sure. ■ 

Proof.- Initially, we observe that the convergence of the series 

+oo 

( 22 /3 n < oo), imply that the series of the residuals 
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Let E': = <1 — E n , E c = — E be the complementary event of E. We 

have 


E c c p| eA = U El 


so that 

( +OO \ +OO +OO 

U E i) ^ J2 p ( E t)< £& = **■ 

i = n / i — n i — n 

Thus: 

P{E C ) < lim E n = 0, 

n — >-+oo 

and E (E) = 1 — P (E c ) = 1 — 0 = 1. ■ 

Lemma 7.7.- Let {(7 n } n g N be a sequence of random variables such that 

+OO 

V n > 0 : P {U n+ 1 > U n ) < f3 n with ^ fl n < oo 


V n > 0 : E n > m a.s. 

Then there exists a random variable LD such that U n — > Poo a.s. and 
Uoo > m a.s. ■ 

Proof.- 

1) Let us consider 

A n = {u > € : ?7 n +i (at) < U n (w)} , 

B n = {lo € : U n (u) > m} . 

By denoting A ^ = O — A n , I3{, = il — B n (complementary events of 
A n and B n , respectively), we have P (,4(j) = (3 n and P ( B' n ) = 0. Let us 
introduce E n = A n n B n . Then, E = (A n n B n ) c = A £ U B^, so that 

P {ED = P (A c n U BD < P {AD + P {BD <P n + 0 = p n , 
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and E n is almost sure, since P (E n ) = 1 — P (E%) = 1 — 0 = 1. Let us 
consider 

E = {w € Q : 3 no (w) such that m < U n+ \ (a;) < U n (u ) , V n > no (a;)} . 


We have E = 
almost sure. 


+oo 

U 


n = 0 



so that it yields from Lemma 7.6 that E is 


+oo 

2) Let a j £ E. Then, there exists n > 0 such that a ; € |^| E t , so that 

i = n 


V i > n : m < Ui + \ (u) < Ui (u) . 

Thus, {Ui (w)}j >n C i is decreasing and bounded, so that there exists 
Uoo (w) verifying 

U (u) — > Uoo (w) for i — > +oo. 

3) In this way, we define a numerical function U^ : 11 — > M. The 
basic properties of the measurable applications (see, for instance, [HAL 64], 
p. 184) show that Uoo is a random variable. In addition, Uoo (w) > m, since 

m < Ui (uj) , V i > n. 

4) Let 

F = {u € Q : U n ( oj ) — > Uoo (w) for n — > +oo } . 

We have E C F, so that P (F) > P ( E ) = 1 and F is almost sure. 

5) Let 

G = {u> € 12 : Uoo (w) > m } . 

We have E C G, so that P (G) > P ( E ) = 1 and G is almost sure. ■ 

Lemma 7.8.- Under the assumptions of Theorem 7.3, we have 

VA > m : P (U n > A) — » 0 for n — > +oo. ■ 
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Proof.- 

1) Let us consider the random variable Z n given by 

Z n = 0, if U n > A, Z n = 1, if U n < A. 

We have 


P (Z n +i — i) — P (Z n + 1 — i, Z n — 0) + P (Z n + 1 — i, Z n — 1) , 

so that 

P (Z n+ 1 =i) = P (Z n = 0 )P (Z n+ 1 = i \Z n = 0) 

+-P (^n = 1) P (Z n + 1 = i \Z n = 1) 


and, by taking 




f P (z n = 0) \ 
= !)>/’ 


we obtain 


7Tn+l — All n , Aij — P (yZ n - |_i — i \Z n — j ) (0 A i, j A 1) . 
2) Let us consider the events 
F = {uj € 12 : t/ n+ i (w) > A and U n (ur) < A } , 

G = {cu € 12 : U n + 1 (tu) > U n (tu)} . 

We have F C G, so that P ( F ) < P (G) < (3 n . As a result, 

P (Z n + 1 = 0, Zn = 1) = P (Un+1 Z A, C/ n < A) < (3 n 


Aoi (7T n ) 1 = P (Z n+ 1 = 0, Z n = 1) < /3 n . 
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3) Moreover, 

A)o = P (Z n+ 1 = 0 | Z n = 0) = 1 — P (U n + 1 < A | U n > A) < 1 — a (A) . 

Thus: 

V n > 0 : (7r n+ i) 0 = A 00 (vr n ) 0 + A 01 (n n ) 1 < (1 - a (A)) (7r n ) 0 + f3 n 

and, therefore, 

n 

V n> k : (t r n+1 ) 0 < (1 - a (A))" +1 - fc (7T fc ) 0 + E a ( A )) n '% 

<=*' <7 ' 

Thus, 

n 

V n > k : (7r n+ i) 0 < (1 - a (A)) n+1 (7r fe ) 0 + E Pv 

i=k 

which implies that 

+oo 

V n > k : (7r n+ i) 0 < (1 - a (A)) n+1 {n k ) 0 + E Pi 

i=k 

4) Hence 

+oo 

V k > 0 : lim sup (7T n+ i) 0 < Y"' /3j. 

n i=k 

Due to the convergence of the series, we have 

+oo 

Rk = E Pi * 0 f° r k * +°°- 

i = k 

So 


lim sup (7r n+ i) 0 < 0. 
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Furthermore, ( 7 r n+ i ) 0 > 0, V n > 0, so that 

lim sup (7r n+ i) 0 = 0. 
n 

5) Since 

lim inf(7r n+ i) 0 < lim sup(7r n+ i) 0 , 

n n 

we also have 

lim inf (7r n+ i) 0 < 0. 

n 

Since ( 7 r r)+ i ) 0 > 0, V n > 0, we also have 

lim inf (7r n+ i ) 0 = 0. 
n 

6 ) The equality of both the values of lim inf and lim sup shows that 
(7r n+ i ) 0 — y 0 for n — » +oo . ■ 

This result implies that 

Lemma 7.9.- Under the assumptions of Theorem 7.3, there exists U Q 0 such 
that U n — 7 - U . Moreover, 

VA > m : P ( Uoo > A) = 0. ■ 

Proof.- 

1) Lemma 7.2 shows that there exists U a 0 such that U n — > U a.s. Since 
U n — > Uoo a.s., we also have U n — y p. and, therefore, U n — > U D. 

2) Let F n be the cumulative distribution of U n and F^o be the cumulative 
distribution of Uoo- The convergence in distribution shows that 

P (Un < A) = F n (A) y Foo (A) = P (Uoo < A) , 

so that 


P (U n > A) = 1 — F n (A) 


1 — Foo (A) = P (Uoo > A) . 
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Thus: 

P (Uoo > A) = lim P(U n > A) = 0 . ■ 

n — >-+oo 


Proof of theorem 7.3- 

1) It yields from Lemma 7.7 the existence of a random variable Uoo such 
that U n — > Uoo a.s. and Uoo > m a.s. As in the proof of this lemma, we 
consider 

E = {a; € : m < U n+ \ (ui) < U n (ui) , V n > 0} . 

E is almost sure and U n (to) — > Uoo (w), for any ui G E. 

2) Let A > m: it follows from Lemma 7.9 that Uoo < A a.s. Thus, it follows 
from Lemma 7.3 that U^ = m a.s. ■ 

This theorem suggests a method for the minimization of the objective 
function /. Let us consider the following: 

1) Z : a continuous random vector having a probability density 

(p z : — > M such that (|«| is the Euclidean norm) 

V r > 0 : inf {ip z (z) : \z\ < r} > a (r) > 0; 

2) xo : U- — > a random vector; 

3) two strictly positive real numbers a > 0 and (3 > 0; 

4) {$n} n e pj a sequence of strictly positive real numbers 6 n > 0, Vn > 0 

+oo 

such that ^2 exp 

n = 0 

We define the sequence of random variables {x n } n g N , x n : Q — > as 

follows: 

V n > 0 . (d^) — En (df) x n (cu) T (1 D n (cu)) t n (cd) , t n — x n -(- Z , 



P (D n = 0| / (x n ) < f (t n ) ) = l-(3 n ,P(D n = l\f (x n ) < f (tn) ) = 0 n ; 
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P (D n = 0 / (x n ) > / (t n ) ) = 1; P {D n = 1 / (z n ) > / (t n ) ) = 0; 

ft - = exp (“ (£ +/5 ))' 

The sequence {U n } n g N , U n : fi — > M is given by 
U n (w) = / (x n (w)) . 

As in the preceding section, the conditional probability density of t n is 
ip n (t\x n = x) = <Pz (t-x) 
and the density of probability of x n is 

K (t) = J Pn (t \x n = x) dx = J ip z (■ t - x) dx. 

R R 

This last equation shows that 

Vr>0:^ n (t)>o(r)f (B r ) > 0, 

where i is the Lebesgue’s measure and B r is the ball of radius r 

(. B r = {z e 1" : \z\ < ?’}). 

We have 

Theorem 7.4.- U n — > m a.s. ■ 

Proof.- 

1) First, we observe that 

' 3 ” = “ p (^(£ + <5 )) s “ p (w)' 

so that 

+OO 

X] Pn < °0. 

n = 0 
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2) By construction: 

V n > 0 : P (Un+i >U n ) < f3 n 


and 


V n > 0 : U n > m a.s. 

3) In an analogous manner to those used in the proof of Theorem 7.2, we 
have 

P{Un+ t<A \U n > A) > a (A) > 0. 

4) The result follows from Theorem 7.3. ■ 

In practice, the following alternative is often used: 

P ( D n = 0 / (x n ) < f (t n ) ) = c n ; P (D n = 1 / ( x n ) < / (t n ) ) = 1 - c n ; 
P (. D n = 0 / (x n ) > f (t n ) ) = 1; P (D n = 1 / (x n ) > f (t n ) ) = 0; 

( ( f ^n) - / (X n ) + & a \\ 

c ” = exp lH K + / 3 JJ- 


We observe that 


c n < exp 


/ / „ a ,, 

fjtn) - / 


+ P 


<exp - (l+P) ) < exp ( — w- 


so that Theorem 7.4 applies yet. The algorithm associated with this choice is 
the stochastic descent method with dynamics of Metropolis, which corresponds 
to the simulated annealing algorithm'. 

1) Initialization: they arc given a starting point xq and a number of trials 
nm. Set iteration number to zero: n < — 0; 
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2) Iterations : the actual point is x n and we determine x n+ \ by using two 
substeps: 

- drawing: randomly generate a variate Z n from Z and set t n = x n + 

Z n . 

- dynamics : determine x n+ \ : if / (. x n ) > / (t n ) then x n+ \ = t n . 

Otherwise: generate a variate a uniformly distributed on (0, 1); if a < c n then 
X n -\-l — tni else — X n . 


3) Stopping test : if n < nm, then increment n : n < — n + 1 and return to 
step 2. Otherwise, stop iterations and estimate rn ~ / (x n ) and x* ~ x n . 

Note 7.2.- The readers may find alternative convergence results in the 
literature, namely involving proofs based on stochastic diffusions (see, for 
instance, [AZE 88] and [GEM 86]). 


7 . 2 . 3 . Version 3: Hybrid methods 

Despite the modifications introduced, the method presented in the previous 
section remains entirely based on the random generation of trial points. In 
order to obtain a significant improvement in efficiency, namely for a large 
number of variables, a simple idea consists of combining the random 
generation of trial points with a deterministic method. For instance, let us 
consider descent iterations which read as follows: 


X n - 1 — Qn i%n) ■ [ 7 . 22 ] 

For instance, gradient descent with a fixed step reads as 

Q n (x) = X - pVf (x) . 

In terms of algorithms, the iterations read as follows: 

1) Initialization : It is given a starting point xo and a maximum iteration 
number nm. Set iteration number to zero: n < — 0; 

2 ) Iterations : the actual point is x n and we determine x n+ \ by using two 
substeps: 

- descent : we generate a new point to = Q n (x n ) by using the descent 
method. 
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- dynamics', determine x n+ \ from x n and to- For instance, use an blind 
dynamics by setting x n+ \ = to- 

3) Stopping test: if n < nm, then increment n : n < — n + 1 and return to 
step 2. Otherwise, stop iterations and estimate m « / (x n ) and x* ~ x n . 

As observed, we may combine the descent iterations with the random 
generation of trial points. For instance, we may introduce an intermediary step 
in the basic iterations: 

1) Initialization : It is given a starting point xo and a maximum iteration 
number nm. Set iteration number to zero: n -f — 0. 

2) Iterations: the actual point is x n and we determine x n +\ by using three 
substeps: 

- descent: we generate a new point to = Q n ix n ) by using the descent 
method; 

-drawing: for i = 1, ...,nr : randomly generate an increment (Ax n ) t 
and set f* = to + (Ax^p, 

- dynamics: determine x n+ \ from x n and U,i = 0, nr. For instance, 
use an elitist dynamics as follows: 

x n+ i = argmin {/ (x n ) , f (t t ) : 0 < i < nr} . 


3) Stopping test: if n < nm. then increment n : n < — n + 1 and return to 
step 2. Otherwise, stop iterations and estimate rn ~ / (x n ) and x* ~ x n . 

This kind of combination between a descent method and a stochastic 
method may be interpreted as a stochastic perturbation of a deterministic 
descent method. The basic general algorithm is as follows: 

1) Initialization: It is given a starting point xq. a maximum of trials nr and 
a maximum iteration number nm. Set iteration number to zero: n < — 0; 

2) Iterations: the actual point is x n and we determine x n +\ by using three 
substeps: 

- deterministic: generate a new point to by applying the deterministic 
method to x n \ 
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- perturbation: 

- generation: for i = I , nr : randomly generate a perturbation 
(Ax n ) ■ and set U = to + (A x n ){, 

- selection : determine x n = argmin { f (U) : 0 < i < nr}. 

- dynamics: determine x n+ i by using x n and x n . For instance, use the 
elitist dynamics. 

3) Stopping test: if n < nm, then increment n : n < — n + 1 and return to 
step 2. Otherwise, stop iterations and estimate rri ~ / ( x n ) and x* ~ j; n . 

Below is an example of Matlab implementation: 

Listing 7.4. Stochastic Perturbation of a descent method 

function [xsol, fsol] = stoc_pert_iterations (xO ,Q, nitmax , 
npert , alea , dynamics_pert , dynamics_it ) 

% 

% performs nitmax iterations using iteration function Q 
% involving npertmax perturbations at each iteration 
% starting from xO 
% assumes that xO is n x 1 
% 

%IN: 

% xO : n x 1 vector — type array of double 
% Q : iteration function — type anonymous function 
% nitmax: maximum iteration number — type integer 
% npert: mnumber of perturbations — type integer 
% alea: generator of random points — type anonymous function 
% alea(n,nt) furnishes a n x nt table of random values 
% dynamics _pert : the dynamics to be used to select the 
p erturnb ation — type anonymous function 
% dynamics_it: the dynamics to be used to select the new point 
— type anonymous function 

% 

xac = xO ; 
fac = f (xO) ; 
for nit = 1: nitmax 
xc = Q( xac ) ; 
fc = f ( xc ) ; 
for np = 1 : npert 

xt = trial_point(xc,l, alea); 
ft = f ( x t ) ; 

[xc, fc] = dynamics_pert(xc,fc,xt,ft,nit); 


end ; 
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[xac, fac ] = dy namic s it ( xac , fac , xc , fc , ni t ) ; 

end ; 

xsol = xac ; 
fsol = fac ; 

return ; 
end 


Example 7.11.- Let us consider x = (1,2 and the Griewank’s 
function 

F ( x )= i+ 4 ||x - s » 2 -n-(^). 

We apply the method with nr = 10, n = 5. The starting point is 0 and we 
use a gradient descent with step // = 0.1. Stochastic perturbations are 
generated by variates from A r (0. a), with a = 0.1. The evolution of the 
objective function is shown in Figure 7.16. 


starting point generated by representation 



Figure 7.16. Results for Griewank’s function of dimension 5 
with stochastic perturbation 


This algorithm corresponds to the iterations 

%n + 1 — Qn i%n) T Pm 


[7.23] 
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where P n is a random variable. These iterations converge to a point of global 
minimum under quite general assumptions, such that 

VM > 0 : 3 a ( M ) such that \x\ < M ==> | Q n (x)| < a (M) , V n > 0. 

[7.24] 

In addition to the preceding assumptions on / (equations [7. 13] and [7. 14]), 
the random variable P n has to be conveniently chosen: 

+oo 

V A > m: P (U n+ i < A \U n > X) > a (n, A) > 0 ; a (n, A) = +oo. 

n = 0 

[7.25] 

These choices arc based on the following result: 

Theorem 7.5.- Let {U n } n g N be a sequence of random variables such that 

V n > 0 : U n+ 1 < U n a.s. 

V n > 0 : ?7 n > m a.s. 


V A > m : 3 a (n, A) > 0 such that P (U n+ \ < A \U n > A) > a (n, A) [7.26] 


and 


+oo 

V A > m : ^2 a (n, A) = +oo. 

»i=0 


Then: 


U n — >■ m a.s. ■ 

We observe again that Condition [7.26] implies that P ( U n > A) > 0. 
This inequality is satisfied by trial points having a strictly positive probability 
density everywhere on R n . 

The proof of the theorem uses the following lemma: 

Lemma 7.10.- Under the assumptions of Lemma 7.5, we have 

VA > m : P ( U n > A) — > 0 for n — > +oo. ■ 
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Proof.- 

1) Let us consider the random variable Z n given by 

Z n = 0, if U n > A, Z n = 1, if U n < A. 

We have 


P (Z n +i — i) — P (Z n + 1 — i, Z n — 0) + P (Z n + 1 — i, Z n — 1) , 

so that 


P (Z n + 1 = i) = P (Z n = 0) P (Z n+ 1 = i | Z n = 0) 
+P (Z n = 1 ) P (Z n + 1 = i \Z n = 1 ) 


and, by setting 


TTn 


f P (z n = 0) \ 

VP(^ = 1)^’ 


we obtain 


7Tn+l — — P (Z n + 1 — i |^n — J ) (0 <Z,J < 1). 

2) Since the sequence is decreasing a.s., we have 
P (Z n+1 = 0 |Z n = 1) = 0, P (Z n+1 = 1 \Z n = 1) = 1. 

Thus: 


^11 (An) i — ( ) i • 

3) Moreover, 

^10 = P(z n+ 1 = 1 |Z n = 0) = P([/ n +t < A I U n > A) > a (A, n) , 

so that 

V 71 > 0 : (7T n+ i) 1 = A 10 (7T n ) 0 + An (7Tn)r > a (A, n) (7r n ) 0 + (7r n ) 1 
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and therefore: 

V n > 0 : (7T n +i) 1 > a (A, n) (1 - (7r n ) 1 ) + (7r n ) 1 . 

4) Since a (A, n) (1 — (7r n ) 1 ) > 0, this inequality shows that 

V n > 0 : (7r n+ i) 1 > ( 7 r n ) 1 . 

So, {(?i' n ) 1 } n N is increasing and upper-bounded by 1. Thus, there exists 
p < 1 such that 

{^n)i — > P for n — > + 00 . 

Moreover, 

Vn>0:p> (7r„) 1 . 

5) We have 

ol (A, n) (1 — (7r n ) 1 ) > a(X,n) (1 - p) , 

so that 


V n > 0 : (7r n+ i) 1 > a (A, n) (1 - p) + {n n ) 1 . 


Thus: 


V n > 0 : (7r n+ i) 1 > (7T 0 ) 0 + (1 -p) E«( a,o. 

i= 0 


and 


V 71 > 0 : 1 > (7r 0 ) 0 + (1 — p) E a (^> *) , 
i=0 

6) Assume that p < 1. In this case, we have 


E°( A ’*) 

t=0 


1 ~ (^0)0 

1 — p 
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so that 


+OO 

+oo = E a (i, A) < 

%= o 


1 ~ (7To)o 

1 — p 


which is a contradiction. Consequently, p = 1 and we have (7r n+ i) 1 — > 1 for 
n — > +oo, so that 


(7r n+ i) 0 — > 0 for n — » +oo . ■ 

Lemma 7.1 1.- Under the assumptions of Theorem 7.5, there exists U a 0 such 
that U n — > U D. Moreover, 


VA > m : P (Uoo > A) = 0. ■ 


Proof.- 

1) Lemma 7.2 establishes the existence of U Q 0 such that U n — > U a.s. 

Since U n — > a.s., we also have U n — > Uoo P- and, therefore, U n — > U 

D. 

2) Let F n be the cumulative distribution of U n and be the cumulative 
distribution of Uoo ■ The convergence in distribution implies that 

P(U n < A) = F n (A) > Foo (A) = P {Uoo < A) , 

so that 


P(u n > A) = 1 — Fn (A) — A 1 - Foo (A) = P {Uoo > A) . 

Thus: 


P ( Uoo > A) = lim P (U n > A) = 0 . ■ 


Proof of Theorem 7.5- 

1) It follows from Lemma 7.2 that there exists a variable Uoo suc h that 
U n — > Uoo a.s. and Uoo > m a.s. Analogously to the proof of this lemma, 
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we consider 


E = {uj £ fl : m < U n + 1 (uj) < U n (uj) , V n > 0} . 

E is almost sure and U n (uj) — > U Q 0 (w), for any uj € E. 

2) Let A > m: it follows from lemma 7.1 1 that U a 0 < A a.s. Consequently, 
it yields from lemma 7.3 that U Q 0 = m a.s. ■ 


This theorem suggests an optimization method. Let us consider the 
following: 

1 ) Z : Q — t- a continuous random vector having a probability density 

(p z : R k — > M such that (|«| is the Euclidean norm) 


V r > 0 : inf {<. p z (z) : \z\ < r} > a ( r ) > 0; 


2) a random vector xq : f i — > 

3) {A n } n G f : :J a sequence of strictly positive real numbers such that A n > 0, 

+°° i f t \ 

Mn> 0 and E 7I a V =+oo ’ V?>0 - 


\ K 
n = 0 « 


The last condition is satisfied, for instance, when 0 < A n < M , Vn > 0 



Too, > 0. In this case: 



An alternative way to satisfy this condition consists of taking 0 < A n < M, 
Vn > 0 and £ i — > a (£) decreasing and strictly positive on (0, Too). In this 
case: 


a 




> 0 


+oo 



= Too. 


n = 0 
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Let us define a sequence of random variables {.x n } ngN , x n : ft — > 
as follows: 

V n > 0 : x n+ i (u) = arg min {/ (x n (u)) , f (t n (w))} , 
tn — Qn (pCn) ~b A n .Z, 


and {U n } n e N , U n : ft — > M given by 
U n (w) = / (x n (w)) . 

In this case, the conditional probability density of t n is 


, . , 1 
ip n tl n =X) = ^ip z 

K 


t- Qn (x) 

An 


and the density of t n is 


w = 3/ 


<Pz 


t — Q n (x) 


dx 


1 

- Tfc 


<Pz 


\ t-Qn(x) \ <r 


t ~ Qn (x) 
An 


dx, 


so that 


V r > 0 : (j) n (t) > -j: 


Pz 


t ~ Qn (x) 
An 


dx 


I t~Q n (x) | 


> - ^a(r)£(B r ) > 0, 


where £ is the Lebesgue’s measure and B r is a ball of radius r 
( B r = {z £ 1" : \z\ < ?’}). 

We have 

Theorem 7.6.- U n — > rn a.s. ■ 
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Proof.- 

1) By construction: 

V n > 0 : U n+ \ < U n a.s. 

V n > 0 : U n > m a.s; 

2) Let A > m. Let us consider S\ = {x € : / (x) < A} and 

S 1 = {x G : / (x) > A}. From assumption [7.14], there exist x\ G S\ 
and t/ x > 0 such that 

B (A) = jx G R fc : |x — xa| < rj A j C S\ 

3) Let 7 > / (x o): S l 7 is bounded so that there is Mq > 0 such that 
|x| < Mq for all x G SA. Thus, |Q n (x)| < 6 (Mq) for all x G S 7 . So 


<p n (t\x n = x)dt> / (i |x n = x) dt 
Sx B( A) 




B(A) 


B(A) 

t ~ Qn (x) 


dt. 


verifies (Jt is the Lebesgue’s measure) 

J V n (* | x n =x)dt>^ J b(^ 

Sx n B( A) 

> ( ^ ) ^(5 (A)). 


Vx + b ( Mq ) 
A„ 


dt 


4) Let us denote by F n the cumulative distribution of x n and by n n the 
measure associated with F n . We have 

P (x n+ i G S\,x n £ S\) = J n n (dx) J <p n (f \x n = x) dt 
S c x Sx 

> J n n (dx) J <p n (t\x n = x)dt 


C( A) 
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so that 

P (x„+i G S\, x n i S\) > £ ( B (A)) J n n ( dx ) , 

s x 

that is 

p (*n+t G S A , x n i Sx) > f ^ + MM 0 ) A £ (p (A)) p (a . n ^ 5a) . 

A n \ A n / 

By assumption [7.15], there are y\ G and > 0 such that 

C (A) = {i€ R k :\x-y\< r A } C 5$. 

Thus: 

P (x n £ S\) = j cj) n (x) dx > f (j) n (x) dx > a (r\) £ (B rx ) > 0 
C( A) 


and, consequently. 


_ P(x n+1 &S\,x n £S\) 


P (x n+ i G S A \x n £Sx)= n p\x„(£S x ) 


5) By setting 


a(n ,\ ) = jr“( " >+ A '’ t| (M ° ) ) <( B WI. 
we have 

P (P n+ i < A |[/ n > A) > a (n, A) > 0 


and 


+oo 

a (n, A) = Too 

n= 0 
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6) The result follows from Theorem 7.5. ■ 

The algorithm associated with this theorem is the stochastic perturbation 
method: 

1) Initialization', let be given a stalling point xq, a number of trial points nr 
and a maximum iteration number nm. Set iteration number to zero: n < — 0; 

2) Iterations', the actual point is x n and we determine x n+ \ by using three 
substeps: 

- descent', generate to = Q n (x n ); 

-drawing: for i = 1 ,...,nr: randomly generate a variate Z t from Z 
and set U = to + \ n Zi\ 

- dynamics: determine x n+ \ = argmin {/ ( x n ) , / (ti) : i = 0, ..., nr} 

3) Stopping test: if n < nm, then increment n : n < — n + 1 and return to 
step 2. Otherwise, stop iterations and estimate rri ~ / ( x n ) and x* ~ x n . 

Note 7.3.- In this situation, the dynamics of Metropolis may be used in step 

2.3. 

Note 7.4.- The readers may find other developments in the literature, such 
as methods for non-differentiable objective functions and methods for 
constrained optimization (see, for instance, [POG 94, AUT 97, DE 04b, 
DE 04a, MOU 06] and [ESS 09]). 

7.3. Population-based methods 

The methods presented in the last section may be applied to a set, i.e. a 
population, of initial points. For instance, let us consider an initial population 
formed by up elements: 

n 0 = {xl,..., X Q P } . 

A simple idea consists of applying the methods to each element x l 0 in such 
a manner that a sequence of populations {n n } ngN is generated: 

n n = {x l n ,...,X^} . 
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In such an approach, each Xq is used as a starting point, and the resulting 
algorithm is often referred to as multistart method. Assuming that the table 
xpop contains npop stalling points (each column xpop( : , i) corresponds to a 
stalling point), it may be implemented in Matlab as follows: 

Listing 7.5. Multistart method 

function [xsol, fsol] = multistart_pert (f , xpop ,Q, nitmax , npert 
, alea , dynamics_pert , dynamics_it ) 
xpopsol = zeros ( size ( xpop )) ; 
npop = size (xpop ,2) ; 
fpopsol = zeros ( npop , 1 ) ; 
for np = 1: npop 

xO = xpop ( : , np ) ; 

[xs, fs] = s t o c _p e r t _i t e r a t i o n s ( f , xO , Q, nitmax, npert , alea , 
dynamic s_pert , dynamics_it ); 
xpopsol (:, np ) = xs ; 
fpopsol(np) = fs; 

end ; 

[m, ind] = min( fpopsol ) ; 
xsol = xpopsol (:, ind); 
fsol = m; 

return ; 
end 


More sophisticated approaches may involve combinations of the population 
members. For instance, we may consider the following algorithm: 

1) Initialization : They arc given the initial population Flo formed by np 
elements, a number of trial points nr and a maximum iteration number nm. 
Set iteration number to zero: n < — 0; 

2) Iterations', the actual population is TI„ and we evaluate II n+ i in three 
substeps: 

- deterministic, generate Af° = {fj, fg P }, where f 0 = Q n ( x l n ); 

-drawing: fori = 1 . .... np and j = 1 ,...,nr: randomly generate an 
increment (A :/;„)* and set t}- = t l 0 + (Ax n )*. This substep generates Af,{ = 

| t l j : i = 1, np and j = 0, m’|; 

- dynamics : determine the new population n n+ i as a pail of II n U M n , 
M n = Af° U Ml . For instance, the elitist dynamics consists of the selection of 
the np best elements of II n U M n . 
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3) Stopping test : if n < mn, then increment n : n < — n + 1 and return to 
step 2. Otherwise, stop iterations and estimate m by the best value of / on fl n 
and x* as one of the elements corresponding to this value. 


Various modifications of this basic algorithm may be found in the 
literature. For instance, it is possible to define M n by choosing the best 
element jt* : j = 0, ...,nr j for each fixed i: the reunion of the results for 
i = 1 . .... np forms M n . We may also introduce supplementary substeps. For 
instance, we may introduce a substep where the available elements arc 
combined in order to generate new elements. An example is furnished by the 
generation of random affine combinations of the elements of n„: we may 
generate the supplementary set 

C n = {a^x l n + (3 lJ x J n + 'fi : i,j, ^ ^ random} 

The elements of C n may be used in a substep placed at the point chosen by 
the user. For instance, we may modify the dynamics in order to select the np 
best elements of Fl n U M n U C n . 

An example of implementation is given below: 

Listing 7.6. Population-based method involving affine combinations 


function [xpop, fpop , err] = pop_based_iterations (f , xpop_ini ,Q, 

nitmax , npert , alea , dy namics_pert , dynamic s it , cmin , cmax , 

n_r ) 

xpop = xpop_ini ; 
npop = s ize ( xpop , 2 ) ; 
fpop = zeros ( npop , 1 ) ; 
for i = 1 : npop 

fpop(i) = f ( xpop ( : , i ) ) ; 

end ; 

for nit = 1: nitmax 

[xpop_r , fpop_r] = affine_comb (xpop , cmin , cmax, n_r,f); 

npop_r = length ( fpop_r ) ; 

nn = npop_r +npop ; 

xpop_m = zeros ( size (xpop , 1 ) , nn ) ; 

xpop_m ( : , 1 : npop_r ) = xpop_r ; 

xpop_m ( : , npop_r + 1 : npop_r+npop ) = xpop; 

[xpop_m, fpop_m] = mute_pop ( f . xpop_m . Q, nitmax, npert .alea , 
dynamics_pert , dynamics_it ) ; 

[xpop , fpop. err] = select_new_pop ( xpop_r , fpop_r , xpop_m , 
fpop_m , xpop , fpop ) ; 
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end ; 
return ; 
end 

function [xpop_m, fpop_m] = mute_pop ( f , xpop ,Q, nitmax , npert , 
alea , dynamic s_pert , dynamics_it ) 

% 

% generates the perturbations of the population xpop 
% 

npop = size ( xpop , 2 ) ; 
xpop_m = zeros ( size ( xpop )) ; 
fpop_m = zer os ( npop , 1 ) ; 
for np = 1: npop 

xO = xpop ( : , np ) ; 

[xs, fs] = s t o c _p e r t _i t e r a t i o n s ( f , xO , Q, nitmax, npert , alea , 
dynamics_pert , dynamics_it ) ; 
xpop_m ( : . np ) = xs ; 
fpop_m ( np ) = f s ; 

end ; 
return ; 
end 

function [xpop_r , fpop_r] = affine_comb ( xpop , cmin , cmax , n_r , f 

) 

% 

% generates n_r affine combinations of each element in xpop 
% coefficients are randomly chosen in (cmin, cmax) 

% 

%IN : 

% xpop: n x npop table of the actual population — type array of 
double 

% cmin, cmax : 3 x 1 table of real numbers — type array of 
double 

% n_r : number of combinations by element — type integer 
% 

% OUT: 

% xpop_r: the combinations — type array of double 
% fpop_r: values of the objective — type array of double 
% 

npop = size ( xpop , 2 ) ; 
nn = n_r*npop; 

xpop_r = zeros ( size (xpop , 1 ) , nn); 
for i = 1 : npop 

x = xpop ( : , i ) ; 

ind = randperm ( npop , n_r ) ; 

for j = 1: n_r 

a = cmin + rand ( siz e ( cmax )).*( cmax — cmin); 
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xpop_r ( : , ( i — 1)* n_r + j) = a(l)*x + a(2)*xpop(:, ind(j)) 
+ a(3) ; 

end ; 

end ; 

fpop_r = zeros (nn,l); 
for i = 1 : nn 

fpop_r(i) = f ( xpop_r ( : , i ) ) ; 

end ; 
return ; 
end 

function [xpop_n , fpop_n , err] = select_new_pop ( xpop_r , fpop_r , 
xpop_m , fpop_m , xpop_old , fpop_old ) 

% 

% selects the best npop elements from the available ones 
% 

npop_r = size ( xpop_r , 2 ) ; 

npop_m = size ( xpop_m , 2 ) ; 

npop = size ( xpop_old , 2 ) ; 

nn = npop_r + npop_m + npop ; 

xpop_a = zeros ( size ( xpop_old , 1 ) , nn); 

fpop_a = zeros (nn,l); 

xpop_a ( : ,l:npop_r) = xpop_r ; 

xpop_a ( : , npop_r + 1 : npop_r+npop_m) = xpop_m ; 

xpop_a ( : , npop_r+npop_m+ 1 : npop_r+npop_m+npop ) = xpop_old ; 

fpop_a ( 1 : npop_r ) = fpop_r ; 

fpop_a (npop_r + l : npop_r+npop_m) = fpop_m ; 

fpop_a ( npop_r+npop_m + 1 : npop_r+npop_m+npop ) = fpop_old ; 

[fpop_a, ind] = sort ( fpop_a ) ; 
xpop_n = xpop_a (:, ind ( 1 : npop )) ; 
fpop_n = fpop_a ( 1 : npop ) ; 

err = norm(xpop_n — xpop_old ) / sqrt ( npop ) ; 

return ; 

end 


7.4. Determination of starting points 

In this section, we establish a representation formula that may be used for 
the determination of starting points. It is interesting to notice that the 
representation established is valid in general Hilbert spaces, including 
infinite-dimensional spaces. As a model situation, let us consider a Hilbert 
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space V and a functional J : V — > M. We arc interested in the determination 
of 


u = arg rnin J, i.e.,u €. S , J(u ) < J(v),\/v G S. [7.27] 


We assume that S C V r is closed, bounded and not empty. Then, there is a 
constant a£l such that ||n|| < a, V v G S. 

Let J : V — > M be a continuous functional. We assume that there is a 
constant f3 G M such that | J (n)| < (3, V v € S. This assumption is verified 
when J is bounded from below and is coercive. 

Let B* be the paid of S situated in the interior of the open ball having center 
u and radius e. We denote by S'* = S — B* its complement in S. We assume 
that there is eo > 0 such that n{B*) > 0,V e G (0, eo)- Let x* be the 
characteristic function of B* and ip* be the characteristic function of S* . We 
have x* (v) + ipl (v) = 1 , V v G S. 

Let A > 0 be a real number large enough (in the following, we let A — > 
+ oo) and g : M 2 — > R be a continuous function such that g > 0. We 
assume that there are e 0 > 0 and two functions h \ , h-> : M 2 — > M such that, 

V £ G (0, £q) : 


E(vg(X,J(v))) 
E (g (A, J(v))) 


> u, weakly in V. ■ 

A — >-\-oo 


[7.28] 


Then 


Theorem 7.7.- Assume that [7.30] and [7.31] are satisfied. If V r j is a 
finite-dimensional linear subspace of V and : V — > V,i is the orthogonal 
projection on Vj, then 


E(P d (v)g(X,J(v))) 

E(g(X,J(v))) 


> Pd (u) , strongly in V. 


[7.29] 


In addition, for any /: <E C ( V, M), 


E(g(\,J(v))) 
E(P d (v)g(X,J(v))) _ 


>+oo 

A+OO 


£(u) ; 

u i weakly in V. I 
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Corollary 7.1.- Assume that, in addition to the assumptions of the 
preceding theorem, the Fubini-Tonnelli theorem applies. Then, 

E ((v,ip n )g(\, J(v))) = {E(vg(X, J(v)) ,<p n ) , Vn € N* and 

E(g(X,J(v))) >hi(A,s) >0; E (ip* e (v)g(X, J(v))) < h 2 (X, e) [7.30] 


V £ € (0, £q) 


fa (A, e) 
/it (A, e) 


-> 0 . 


[7.31] 


These results suggest that we may consider approximations of 


x* = argmin{F(y) : y e 5} 


having the form 


£-p(A,F( Xi )) 


or simply 

^ ns 

X* ) » — ^x;p(A,F(xi)) 

In practice, these approximations may be used in order to generate an initial 
population by using the code 

Listing 7.7. Generating an initial population 

function xpop = inipop_rep ( f , n , npop , ntmax , alea , lambda , g) 

% 

% generates a population of npop elements from R A n 

% by using the representation formula involving function g 

% and parameter lambda 

% xpop has dimensions n x npop 

% 

xpop = zeros ( n , npop ) ; 
for i = 1 : npop 

xac = zeros ( n , 1 ) ; ) ; 
sw = 0 ; 

for nt = 1 : ntmax 

xt = alea (n , 1); 

ft = f ( xt ) ; 
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wt = g ( lambda , f t ) ; 
xac = xac + wt* xt ; 
sw = sw + wt ; 

end ; 

xpop ( : , i ) = xac / sw ; 

end 

return ; 
end 


Example 7.12.- Let us consider x = (1,2, ...,n) and the Griewank’s 
function 

F(x) = i + — ||x - x|| 2 - TT cos ( . 

200 11 l\ V Vi ) 

We apply the method with ns = 2, 500, A = 10, n = 5, 

g(X,s) = exp (—As). Random vectors are generated by variates from 
N(0, a), with cr = 5. The mean of the population generated is used as a 
starting point to a gradient descent with step fi = 0.1. The evolution of the 
objective function is shown in Figure 7.12 



iteration 

Figure 7.17. Results for Griewank’s function of dimension 5 with the 
representation 
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We observe that these mathematical results remain valid in any dimension, 
including infinite -dimensional separable spaces. In fact, theorem 7.7 is the 
result of the following proposition: 

Proposition 7.1.- Assume that [7.30] and [7.31] are verified. Let 

^e£(V,R d ),deN*. Then 


l(u) 


lim 


E(£(v)g(X,J(v))) 
E(g(X,J(v ))) 


Proof of the proposition.- 

1) We have 

E(£(v)g(X,J(v))) _ E (l (u) ff(A, J (v))) E (£(v - u)g(X, J(v))) 
E (g( X, J(v))) E (g( X, J(v))) E (g( A, J(v))) 

Thus: 

E(((„)g(A,J(v))) ,, , , E(t(v-u)g(\,J(v))) 

WuRF = f <u) + TETwiT' 17321 

2) Let e € (0, £o)- We have 


£ (£(n - it)s(A, J{v))) = E (£(v - «)(Xe(v) + CMM A, J(v))) . 


Thus: 

£ (£(n - u)fl(A, J(u))) = 77 (£(v - u)x* £ (v)g(X, J(v))) 

+ E (£(v - (v)g(X, J (v))) . 

3) Since £ £ C (V, M d ), there exists a constant M p £ M such that, for 
v £ B*, 

\£{v — u) | p < M p ||n — it || < M p e [7.33] 

while, for v eS*, 

\£{v — u)\ p < Mp ||n — u|| < Mp (||n|| + ||rt||) < 2 M p a [7.34] 

4) On the one hand, from Jensen’s inequality (Proposition 1.18), 

I E (£(v - u) X *(v)g(X,J(v))) \ p < E ( \£(v - u) \ p xt(v)g{ X, J (u))) . 
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and, from equation [7.33], 

E (\l(v - u)\ pX * £ (v)g{\, J(u))) < M p sE (x* e (v)g( X, 

On the other hand, Jensen’s inequality (Proposition 1.18) also shows that, 

I E (£(v - u)^* £ {v)g(X,J(v))) \ p < E ( \£(v - u)\ p ^* E {v)g{\ J(n))) . 
and we have from [7.34] and [7.30]: 

E ( \£(v - u)\ p i/)* £ (v)g(\, J(v ))) < 2M p ah 2 (X, s). 

5) Thus: 


E(£(v-u)g{X, J{v))) \ p < eMpE (x* £ (v)g(X, J(v))) + 2M p ah 2 (X, e), 


I E(t{v-u)x*Av)g{^,J(v))) I 

| E(g(X,J(v))) \ p 

^ eM p E(x%(v)g{\,J(v))) + 2M p ah 2 (\, e) 

~ E(g(X,J(v))) 

eM p E{x%(v)g{X,J(v))) , 2M p ah2(\, e) 

E(g(X,J(v))) + E(g(X,J(v))) ■ 

6) Since 


E(x*e(v)g( A, Jfo))) 

77(«7(A, J(n))) 


< 1 


and 


h 2 (X, e) h 2 (X, e) 


E (g{X, J(v))) ~ h\(X, s)' 
we have 

|77(((n-n) 5 (A,J(n))) 


E(g(X,J(v))) 


< [ £ + 2a 


h 2 (X, e) 
Tit (A, e) 


M r , 
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7) This inequality, combined with equation [7.31], yields 

\E(£(v-u)g(X,J(v))) 


V £ € (0, £o) : lim 

A — >+oo 


E(g(X,J(v))) 


< eM p . 


Thus: 


lim 

A — >-+oo 


and we have 


E(£(v -u)g(X,J(v))) 


E(g(X,J(v))) 


= 0 


iim PfafafafaP =<>, 

A — s>+oo E(g(X,J(v))) 

8) The result is obtained by taking the limit for A 
[7.32] and using equation [7.35]. ■ 


[7.35] 

+ oo in equation 


Proof of the theorem.- 

1) Let {</q, ..., <j) d } be an orthonormal basis of V d and £ : V — > be 

given by £(v) = ((v, fa) ,..., ( v , fa)) . £ is linear and \£(v)\ 2 = || Pa (v) || < 
||n || . So, £ € C (V, M rf ) and, from Proposition 7.1: 


lim 

A — H-oo 


E{(v,fa)g{X,J{v))) 
E (g(X, J(v))) 


= (u, fa) • 


Thus: 


E(P d (v)g(\,J(v))) _ 
>A- 


_ ST d E({v,4> n )g{X,J(v))) 
~ 2^n = 0 E(g(X,J(v))) 1 

>+oo X;n= 0 (^’ fin) & n 


strongly in V and the first claim is established. 

2) Let £ € C, (V, M), 4> = {<^ n } ngN * C T be an orthonormal Hilbert 
basis of V, V d = [{yq, ..., ip d }}, P d ( v ) = Y^=i (“> <Pn) the orthogonal 
projection of v onto V d . Since £ is linear continuous, we have: \£(v)\ < 
M\ ||n|| and \£ ( P d (n))| < M\ \\P d (n)|| < M\ ||n||. Since S is bounded, there 
is a constant a € M such that ||n|| < a, Mv € S and we have \£ (P d (n))| < 
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M\a, \/v € S. Let 

E(£(P d (v))g(X,J(v))) 

E(g(X,J(v))) 

From Proposition 1.18: 

,,,, „„ £(l<(Pd(t>))ls(A,J(i>))) „ ,, 
|i(,i ' A)l £ eiUil M ' a ■ 


3) let L = limsup L ( d , A). Then, 

Ve > 0: A > Ao (e) and d > do (e) =t> sup L (d, A) — e < L < sup L (d, A) 
Since L (d, A) — » £ (Pd (it)) for A — > +oo : 

Vr/ > 0: A > Ai (rj, d) => i (Pd (it)) — rj < L (d, A) < £ (Pd ( it )) + 77 . 


Thus, A > Ai (r/, d) => i (Pd (it)) — rj< sup L (d, A) < £ (Pd (it)) + f/ e 
d> do (e) and A > Ai (rj, d) => £ (Pd (it)) — i ] — £ < L < £ (Pd ( it )) + 77 . 


For 77 — > 0, we have d > do (e) => £ (Pd (it)) — e < L < £ (Pd (it)). 
Moreover, since Pd ( it ) — >• it strongly in 1/, £ (Pd (it)) — >• £ (it): we have 
£ (it) — £ < L < £ (u). Thus, by taking the limit e — > 0, we have L = £ ( u ). 

4) In an analogous manner, we obtain that liminf L (d. A) = £ (it). Thus, 
lim inf L (d, A) = lim sup L (d, A) = £ (it). Thus, £ (it) = lim L (d, A). 

5) Let 


it (d, A) 


E(P d (v)g(X,J(v))) 
E(g(X,J(v ))) 


^^o n (A)y? n ; a n (A) 

n=l 

E (g(X,J(v))) 


and £ € £ (V, M). We have £ (n (d, A)) = ^n=i a n (A) ^ (<Fn) = L(d, A). 
So, as previously established, £(u(d,X)) = L (d, A) — £ (it) for A — >• 
+00, d — > +00. Thus, it (d, A) —a it, weakly in V. ■ 
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Proof of the corollary- If E((v,<p n )g(X,J(v))) 
(E(vg(\,J(v)),ip n ), then 


(In, (A) 


(E(v) ,y n )g(X,J(v)) 
E (g(X, J(v))) 


u (d, A) 


P d (E(vg(X,J(v))) 
E (g(X, J(v))) 


Taking d 


+oo, we have 


+OO 

n= 1 


E(g(X,J(v))) ' 


Since ii\ — > u, weakly in V, for A — > +oo, we obtain the result. ■ 

Note 7.5.- In a finite-dimensional space, weak convergence implies the 
strong convergence. Therefore, if V is a finite-dimensional space, then 


E (g(X, J(v))) 


■> u in V. 


+00 


[7.36] 


The choice of g may be guided by the following result: 

Proposition 7.2.- Assume that S* is weakly compact for e € (0, ep) an d 
J is weakly l.s.c. (lower semicontinuous). Let g : M 2 — > M be a continuous 
function such that g > 0 and £ — > g(\, £) is strictly decreasing for any 
A > 0. Then, [7.30] and [7.31] are satisfied. ■ 


The proof uses the following auxiliary result: 

Lemma 7.12.- Let eo > 0 be small enough and e € (0, £o). If S* is weakly 
closed and J is weakly l.s.c., then 


39 = 0(e) > 0 such that rnax g( A, J(v)) < g( A, J(u )) exp(— A 6) [7.37] 


Moreover, for any 5 > 0 such that 5 < 0, there is rj = rj(5) > 0 such that: 
min g( A, J{y)) > g( A, J(u )) exp ( — A 5) . u 
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Proof of the lemma.- 

1) Since g is decreasing, equation [7.27] implies that 

Vx € S : g{ A, J{v))<g{ A, J(u)). 

Thus: 

max g{ A, J{v)) < g{ A, J(k)). [7.38] 

2) There is a sequence { u n } n>0 C S* such that 

g (A, J(tt n )) > max g{ A, J(x)). 

n — >-+oo S'! 

Given that S'* is weakly closed, this sequence has a weak cluster point 

u € S*. 

3) Since J is weakly l.s.c., we have 
J(tt) < liminf J[u n ) 


and 


g (A, J(m)) > limsupry (A, J[u n )) = max p(A, J(n)) . [7.39] 

4) Equation [7.38] shows that g (A, J(u )) < p (A, J(n)). This inequality, 
combined with equation [7.39], establishes that 

max g{ A, J(n)) = p(A, J(rZ)) < p(A, J(u)). [7.40] 

5) Let us assume that 

max p(A, J(n)) = g{ A, J{u)). [7.41] 

Equation [7.40] shows that p (A, J(tz)) = 5 (A, J{u)). Since ^ — » 
p(A, is strictly decreasing, we have J(u) = ■/(«) and the uniqueness of 
u shows that u = u. So, ||w — rt|| = 0. But, u € S* and ||tt — «|| > e: we have 
0 > e > 0 , which is a contradiction. 

6 ) Since [7.41] leads to a contradiction, quation [7.38] implies that 

max g( A, J(v)) < g( A, J(u)) . 


[7.42] 
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Thus, we have [ 7 . 37 ], 

7 ) Let r] > 0 be small enough and 771(77) = max s* J( v )- The continuity 
of J shows that 777.(77) — > J(u) , and the continuity of g shows that 

[ 7 . 43 ] 


r\ — 5>0+ 
4 1 . 


g( A, m{rj)) _ 
g (A, J(u)) v — ^ 04 - 

Let h > 0 be given. From equation [ 7 . 43 ], there exists 77(h) > 0 such that 

V 0 < 77 < 77(h) : 9 > exp ( - Ah) . 

9 (A, J{u)) 

Thus, g (A, 777(77)) > g (A, J(u)) exp ( — Ah). In addition, 

Vtt£ B* : m(rj) > J(v). 

Since g is strictly decreasing, 

ViG : g (A, 777(77)) < 9 (A, J{v)) 

and we have 


min g( A, J(v) ) > g (A, 777(77) ) > g( A, J(u )) exp ( — Ah) . 


Proof of the proposition.- Lemma 7.12 shows that (see equation [ 7 . 37 ]) 

Emv)g(X, J(v)))<g( A, J(u)) exp ( - Xd) E . 

Let h 2 (X, e) = g{ A, J(rt)) exp ( — A 9 ) fi (S*). From equation [ 7 . 37 ] (see 
Lemma 7 . 12 ): 

E(g(X, J(v))) > E(x*(v)g( A, J(v)))>g( A, J(n)) exp(-Ah)E(V’*(n)). 
Thus, by setting /ii(A, e) = g( A, J(n)) exp (—Ah) // (S*) , we have 
Fi( ff (A, J(v))) > hi(X, e) > 0 
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and, V e € (0 ,£q): 


/i2(A,e) 

= ms;) 
»( B t) 


exp(-\9)g(\, J(u)) 
exp(-A5)g(A, J(u )) g.{B*) 

exp (—A ( 6 — 5 )) 


-> O.l 


These results suggest a numerical method for the determination of u, 
based on the generation of finite samples of the random variables considered. 
For instance, we may choose A large enough and generate a sample 
V = {v\, ..., v nr ) of nr variates from v. Then, 


u~u a = J2v i g(\J(.v i ) j ^g(A,J(vi)), 

i= 1 i= 1 


[7.44] 


which coiTesponds to the approximations 

E(vg(\,J(v)))* ±Y,7=iVig{\J{vi)Y, 

77(p(A,J(n)))«^E;=iff(A,J(^)). 

The sample V may be constructed by using the standard random number 
generator. For instance, a random element from (N*) fc x M /,: may be obtained 
by using, on the one hand, a sample n = ( n±, ..., n&) of k random elements 
from N* and, on the other hand, a random element x = (x ni , ...,x nk ) from 
M /,: . For this, it is necessary to use a random generator of integers for n £ 
(N*) fc and a random generator of real numbers for x £ M fc . Random 
elements of Mq° may be obtained by a random selection of the index k 
followed by the generation of a random element from ( W ) k x M fc . From the 
purely mathematical standpoint, this procedure does not involve 
finite-dimensional approximations: the values of k and n span all the possible 
values and generate any value of Mq°. Of course, in practice, limitations arise 
from the computer (there is a maximum for the value of k ) and from the 
generator used (some values of k may have very small probabilities and so 
become practically negligible). 

The procedure may be illustrated by an algorithm denoted by M 1 : let k £ 
N* be given and consider the generation of an element of (N*) fc x M /,: . For this, 
we consider two strictly positive real numbers 9 € M and a £ R. 
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M 1.1: Set i i — 0. 

M 1.2: Select k elements from N* using n = ( mi + 1, ...,mfc + 1), where 
(mi,...,mfc) is a sample of k variates from Poisson’s distribution 
V{6). 

M 1.3: Select tc = (x ni , ... ,x nk ) as a sample of k values from the Gaussian 
distribution A f (0, a). 

k k 

MIA: Set Vi = J2 v n M> n j ( or E ^ nj ^ nj ). 
j 1 r ' 1 

Ml. 5: Increment i : i i — i + 1. If i = nr. then approximate the solution by 
using equation [7.44]. Otherwise, return to Ml. 2. 


An alternative M2 is the following: let be given four strictly positive 
numbers € N* , 8q € M, 6 £ M and a € K. 

M2.1: Set ii — 0. 

M 2.2: Select k = ko + rn G N*, where m G N* is a variate from the Poisson’s 
distribution V (9 q). 

M 2.2: Select k elements from N* using n = ( mi + 1, ..., m+ ; + 1), where 
( mi, ..., mk) is a sample of k variates from the Poisson’s distribution 

vie). 

M2 A: Select x = (x ri] , ..., x r , k ) as a sample of k variates from the Gaussian 
distribution A f (0, a). 

k k 

M 2.5: Set Vi = v njV n j ( or E v nj ^ nj ). 
j = i * = l 

M 2.6: Increment i : i < — i + 1. If i = nr, approximate the solution by using 
equation [7.44], Otherwise, return to M 2.2. 


The readers may find examples of applications in finite-dimensional or 
infinite-dimensional situations in the literature (see, for instance, [DE 07, 
BEZ 08, BEZ 05, BEZ 10] and [ZID 13)]. 



8 


Reliability-Based Optimization 


Structural optimization looks for the best design of elements involved in 
a structure. Usually, the objective is cost or mass reduction and restrictions 
must be taken into account: for instance, a given available space imposes a 
restriction or failure criteria must be taken into account. Regarding practical 
applications, the objective and the restrictions may contain parameters affected 
by uncertainty: for instance, experimental data or material parameters having 
some variability. In addition, the external loads may appear as incertain: for 
instance, the design of a structure submitted to wind, sea waves or random 
vibration must take into account the intrinsic variability and uncertainty of the 
external loads. 

Whenever stochastic or variable parameters are pari of optimization 
problem, difficulties arise, such as: 

- small variations of the parameters may correspond to large variations of 
the optimal design (lack of robustness); 

- the optimal design cannot be exactly obtained in practice, since 
fabrication tolerances and geometrical errors arise (variability and 
uncertainty); 

- the non-homogeneity of the materials, the variability of the external loads 
and of the boundary conditions may lead to situations where the optimal 
design does not satisfy the restrictions and becomes unsafe for some operating 
conditions (unreliability). 

In order to take in account eventual implementation errors and variations 
in operating conditions, uncertainty must be taken into account in the design 
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procedure. A complete characterization of the uncertainty of the optimal 
design involves the use of the approaches presented in Chapter 7, but the 
designer may be also simply interested in obtaining a single design having 
some robustness properties, such as the satisfaction of the restrictions with a 
prescribed probability. 

A first approach consists of using safety factors, i.e. in modifying the 
results or the optimization problem in order to achieve the goal of 
probabilistic constraint satisfaction. Safety factors arc generally multiplicative 
coefficients to be applied to loads or the variables. The main limitations in 
this approach arc that, on the one hand, the coefficients arc specific to 
particular situations and cannot be easily extended to other situations and, on 
the other hand, they are generally produced by the experience and the 
observation of historic failures, what makes their determination a difficulty 
problem for situations where the number of observed failures is insufficient. 

In order to surmount these difficulties, an alternative consists of 
introduction of restrictions involving the probabilities of some events leading 
to failure. For instance, we may introduce a restriction on the maximal 
probability of such an event, what corresponds to a minimal reliability. This 
approach leads to reliability-based design optimization (RBDO), which is 
presented in the following. 


8.1. The model situation 

Reliability is characterized by the reliability index, denoted by ft. The 
reliability index is defined for structures whose state is defined by a vector of 
parameters x G M" - referred to as the physical variables. The failure 
corresponds to negative values of a real variable Z = r/(x), where g is a given 
function, referred to as limit curve, or limit state curve. M n is split into two 
disjoint regions, S and D. S corresponds to a positive sign of Z, referred to as 
safe region - a point x <G C corresponds to safe operation. D corresponds to a 
negative sign of Z, referred to as failure region - a point x £ I) corresponds 
to failure. The limit curve C is the boundary common to both the regions 
C = dS = dD. 


S = {x £ r : g(x) <0},fi = {x£ M n : g(x) > 0} 
C={x£ r : g(x) = 0}. 
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Uncertainty is introduced by considering, for instance, the state of the 
structure that is not given by the nominal parameters x but by 
r (x, v) = x + v, where v is a random variable. In this case, the limit curve is 
given by: 

G (x, v) =g{ r (x, v)) = 0 

The situation is analogous if g depends on uncertain parameters v, or r is 
a more complex function: in all these situations, the limit curve becomes a 
random function. Thus, the variable defining the failure becomes: 

Z = G (x,v) = g (r (x, v)) . [8.1] 

Z , as defined by equation [8.1], is a random variable and the inequalities 
Z < 0 and Z > 0 define complementary events having associated probabilities. 
For instance, the probability of failure associated to a given x is: 

Pf (x) = P (Z > 0 | x = x) , [8.2] 

while the reliability associated to x is: 

P f (x) = P(Z< 0|x = x), [8.3] 

The model RBDO problem is found below. 

Model Problem 8.1.- Let there be a target maximal failure probability P t . 
Determine x e M n such that: 

x = arg min{F (x) : x € A}, A = {x G M n : Pf (x) < P t \. ■ [8.4] 

This formulation looks for the minimal value of the objective function F 
for a prescribed maximal failure probability P t . Additional deterministic or 
stochastic restrictions may be introduced in the definition of the admissible set 
A. 


As an alternative, we may consider the determination of the maximal 
reliability for a prescribed maximal value Ft of the objective function F. 
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Model Problem 8.2.- Let there be a target maximal cost F t . Determine 
xeR" such that: 

x = arg min {Pf (x) : x G A}, A = {x G : F (x) < F t }. ■ [8.5] 

Example 8.1.- A simple example is furnished by the situation where a 
structure has a maximal admissible stress equal to cr max : in this case, we may 
take x\ and a ; 2 as being the stress <7 and the maximal admissible cr max , 
respectively, i.e. x = (a, a max ). Then, g(x) = xl — x2 = a — a max . Other 
classical examples arc furnished by Wohler’s curves (or limit curves) - 
usually given by equations of the form g(x) = 0. 


8.2. Reliability index 

The standard reliability approach introduces a reliability index (3, such that 
increasing /? corresponds to increasing the reliability, i.e. decreasing the 
probability of failure. f3 is defined by using normalized independent variables 
u G M m , such that: 

- each component of u has a mean equal to zero and a variance equal to 
one; 

- u has a cumulative function F depending on ||u||: F (u) = *(M); 

- t — > $ (t) is strictly decreasing. 

For instance, the classical Nataf’s transformation [NAT 62] brings v to a 
vector u of independent Gaussian variables and satisfies these assumptions. 
Thus, increasing ||u|| corresponds to decreasing the probability of failure P t : 
this property is used in order to define the reliability index (see below). 

From the mathematical standpoint, we consider a transformation r : M n x 
M m — 7 - M n x pT connecting the couples (x, v) and (x, u). Thus, there exists 
a function T : M n x M m — > such that v = T (x, u). 
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In general situations, T is a complex transformation and its determination 
may involve some difficulty. In some simple situations, T is an affine 
transformation T (x, u) = A(x)u + B(x), where A(x) and B(x) are 
matrices, independent from v and u. For instance, if, on the one hand, v is 
formed by independent variables and, on the other hand, the distribution of 
each component v t corresponds to the cumulative function f % (u t ) = <I> (|u,|), 
having mean m* and standard deviation we may consider normalized 
random variables given by r u t = (v, — rrii)/si and we have 
v = m + diag(a)u, where m = (mi, . . . , m n ) is the vector of the means and 
a = (<ti, . . . , a n ) is the vector of standard deviations. 

By introducing T, we have: 

Z = H (x, u) = G(x, T(x, u)); R (x, u) = r (x, T (x, u)) . [8.6] 

and it is natural to consider as working space: 

V = {((x, u) : x £ R n , u € M m , u vector of normalized 
independent variables}. 

V is the hybrid space. We have V = V x x V u , where V x = M n is the 
physical space and V r u is a space of normalized random variables (mean zero, 
variance one), called normalized space. We have: 

S = {(x, u) G V : H(x, u) < 0 },D = {(x, u) € V : ff(x, u) > 0}, 

C = {(x, u) € V : H(x, u) = 0}. 

The reliability index (3 is determined in the normalized space and is defined 
as: 


P(x) = ||u(x) || = min{||u|| : H(x, u) = 0} 

= min{||u|| : H(x, u) > 0}. [8.7] 

u(x) is associated to the physical variables v(x) = T(x, u(x)) and 
corresponds to the state of the structure r(x) = r(x, v(x)) or 
R( x) = R(x, u(x)). The points u(x), v(x), r(x) and R(x) ai - e usually 
referred to as the most probable failure points (or simply most probable 
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points ) associated to the state x. Sometimes, the expression design point is 
also used. 

The determination of /3(x) is referred to as reliability analysis of the point 
x. 

8.3. FORM 

The determination of u(x) and /3(x) requests the solution of a constrained 
optimization problem - equation [8.7] for the given x, which may lead to a 
computationally expensive problem involving difficulties - namely for 
experimentally determined limit curves. 

A first attempt in order to increase the efficiency by saving computational 
effort at this level is introduced by the First Order Reliability Method (FORM), 
which tends to furnish a rapid evaluation of u (x (/ ' M and /3(x^’^). 

When a starting point to 0 ) is given, FORM generates a sequence 
u^ 1 ), . . . which is expected to converge to u(x), by using a sequence of 

affine approximations of the limit curve C. For a given u^, we determine an 
affine function u — > (u) such that H (x, u) ~ FfW (u) on the 

neighborhood of uW. Then, we determine: 

u u ‘ + ' = argmin{||u|| : u) > 0}. 

We observe that this equation corresponds to the approximation of the 
failure region D by a half-space D lr \ FORM is very popular due to its easy 
implementation and rapid convergence (even if the limit may be different 
from u(x)). Breitung and Der Kiureghian [BRE 84, KIU 87, KIU 91] have 
proposed an analogous method, based on quadratic approximations of C - 
usually referred to as Second Order Reliability Method (SORM). 

Both the approaches FORM/SORM involve the construction of 
approximations of the limit curve C. In the situations where such an 
approximation is not available, /3 may be evaluated by using a statistical 
approximation: a sample of points of D is generated and f3 is approximated 
by the empirical value determined on the sample: 

f ~ min{||x — y ; || : (y 1; . . . , y ; .\r) sample of points from D}. 
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For Gaussian variables, this approximation is equivalent to a Monte Carlo 
approximation for the evaluation of the failure probability. 

8.4. The bi-level or double-loop method 

The model RBDO problem 8.1 reads as: 

x = arg min { F (x) : H (x, u (x)) < 0 and /)(x) = ||u (x)|| > j3 t }, [8.8] 

where (3 t is the target reliability index. The alternative formulation [8.2] reads 
as: 


/ 3 = arg max{/3(x) : H (x, u (x)) < 0, and F(x) < F t } [8.9] 

Both the problems [8.8] and [8.9] contain difficulties which arc still with us, 
namely those connected to their non convexity and the evaluation of /3(x, u). 
Both these problems are bi-level optimization problems. For instance, equation 
[8.8] looks for x at the superior level, but requests the determination of u (x) 
at the inferior level. Analogous, equation [8.9] looks for (3 at the superior level, 
but requests the determination of x at the inferior level. Typical iterations for 
the solution consist of generating a sequence x^°\ x^ 1 ), x^ 2 ), ... as follows: 

Algorithm 8.1. Bi-level or double-loop RBDO 
Require: kmax > 0, precmin > 0; 

Require: an admissible stalling point x !( 9; 

Require: a method for the determination of u (x) for a given x; 

Require: a method for updating x for a known u (x) ; 

1. A: = 0; 

2. inferior level (internal loop): determine = u (x-^) ; 

3. superior level (external loop): determine x (/, ' + 1 1 by updating x ^ for 
u (x( fc )) known; 

4. k = k + 1; 

5. Test the stopping conditions: ||x fc — x fc_1 1| < precmin or k > kmax. If 
the stopping conditions are not satisfied, go to 2. 

return u ( x 


Examples of methods for updating x. (k> , may be found in the literature. 
The determination of u l ' / 4 is referred to as being the reliability level , while the 
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determination of x- fc+1) is referred to as the optimization level. The resolution 
may request a large number of calls to each level. Thus, methods of 
simplification have been proposed in the literature - for instance, the use of 
FORM at the inferior level. We examine other approaches in the following. 

An alternative approach consists of generating x^ :+1) by determining 
successively: 

u ^ = argmin{||u|| : H > 0 and ||u|| > (3 t } 


and 


x (fc+t) _ ar g m in{F (x) : H ^x, < 0}. 

In this case, / u (x^). 


8.5. One-level or single-loop approach 

A way in order to save computational effort may be the reduction of the 
cost associated to the internal level (reliability analysis). FORM/SORM 
approximations may be considered, but an alternative approach may be 
introduced by transforming the bi-level problem into a one-level problem, 
where the variables x and u are considered as independent and the equality 
|| u|| = (fix) is achieved only at the limit of the iterations, i.e. at the optimal 
point. These ideas lead to the introduction of a new objective function ./,/, 
where a convenient “penalty” term is introduced in order to ensure that the 
optimal point (x, u) satisfies ||u|| = /Ax). For instance, we may consider a 
positive function d : M 2 — > M such that, on the one hand, a — > d(a , b ) is 
increasing for any b > 0 and, on the other hand, b — > d(a. b) is decreasing 
for any a > 0. Then, we set: 

Jd(x, u) = J(x)d(||u|| ,ft(x)), 
and we consider the problem: 

(x, u) = argmin{J d (x, u) : (x, u) € A d }\ 

Ad = {(x, u) : iT(x, u) = 0, || u|| > j3 t }. [8.10] 
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Taking into account the properties of d, it is expected that optimization 
procedures will decrease ||u|| while increasing /3(x). It is expected to get at 
the limit, a point verifying Tf(x, u) > 0, ||u|| > d t , minimizing ||u|| and 
corresponding to the largest /3(x) compatible with the restrictions: the set of 
all these properties achieves the equality ||u|| = /?(x) > (3 t , since the 
minimality of ||u|| joined to H(x, u) < 0 (i.e. to the admissibility of (x, u)) 
lead to u = u(x). This new formulation is expected to decrease the 
computational cost, since we do not need to perform a reliability analysis at 
each iteration - this method generates a sequence 
(x(°), u(°)), (xW, u^), (x( 2 ), u( 2 )), . . . such that the condition 
u( fc ) = u(x( fc )) is not satisfied at each iteration number k. The problem in 
equation [8.10] has to be solved by an appropriate descent method and may 
be implemented by the following algorithm: 

Algorithm 8.2. One-level or single-loop RBDO 
Require: kmax > 0, precmin > 0; 

Require: an admissible starting point (x^ , u :0) j ; 

Require: a descent method for the minimization of 

1. k = 0; 

2. Determine (x. sk+l> . ii (/ '’ +l i) G A,i by one iteration of the descent method; 

3. k = k + 1; 

4. Test the stopping conditions: |x A: — x fc ~ 1 1 1 < precmin or k > kmax. If 

the stopping conditions arc not satisfied, go to 2. 

return u (x^ fc ^) 


The reader will find in the literature examples of d. A simple convenient 
choice is d(a, b ) = a, which corresponds to: 

J d (x,u) = J (x) ||u|| . 

This choice has shown to be effective to calculate, namely for situations 
where the response of the mechanical system may be analytically determined 
(i.e. the systems where a model explicitly connecting the design variables and 
the response is available). Analogous to the bi-level approach, equation [8.10] 
has difficulties which are still with us, namely those connected to non-convex 
optimization - the iterations may converge to local minima. In addition, this 
approach tends to give an identical relative importance to the values of ||u|| 
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and </, which may be considered as inconvenient. Attempts to generate more 
equilibrated objective functions may be found in the literature, such as 

Jd,R (x, u) = J d (x, u) J (E (R(x, u))) , 

where E( R(x, u)) is the mean of R(x, u) - the variables defining the state of 
the system. In this case, we determine: 

(x, u) = arg mm{ J d R (x, u) : (x, u) G A d . R }\ 

Ad,R = {(x, u) : H (R(x, u), u) = 0, ||u|| > /3 t }. 

Choosing a modified and more equilibrated objective function such as J d) R 
often generates a significant increase of the computational time, which makes 
this approach unsuitable. As an alternative, semi-analytic methods, such as 
safety factors, may be used. 

8.6. Safety factors 

A safety factor is usually a real number associated to a variable and 
destined - generally by a multiplication - to increase the reliability. Safety 
factors are generally introduced for the critical parameters, such as, for 
instance, external loads, maximal stresses, displacements, deformations. They 
are generally produced by the observation of preceding failures and are 
connected to empirical design rules. Their empirical determination often 
involves experimentation, inverse problems and calibration. However, the 
basic idea of determining a correction to be applied to the variables may be 
exploited into another way: we show in the following that corrections may be 
generated by using optimality conditions - in such a situation, the corrections 
are usually referred to as optimal safety factors (OSF), but the reader must 
keep in mind that these are not empirical safety factors, but only numerical 
corrections destined to improve the preceding approaches. 

In order to generate a correction, let us recall that u(x) verifies: 

|| u(x) || = min{||u|| : iT(x, u) > 0}. 

This problem may be rewritten under the following equivalent form: 


u(x)|| 2 = min{||u|| 2 : H(x, u) > 0}. 
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Let us introduce the Lagrange’s multiplier A > 0 associated to the 
restriction H(x, u) > 0. The Lagrangian associated to this problem is: 

L( u, A) = 1 1 u 1 1 2 — A iT(x, u). 

The stationary points of L satisfy: 

V u L(u(x),A) = 0, A > 0, H(x., u(x)) > 0, \H(x, u(x)) = 0. 


2u(x) — A V u fT(x, u(x)) = 0 =>• u(x) = ^V u i7(x, u(x)) 
and 

/3(x) 2 = ||u(x)|| 2 = ^ ||V u iT(x, u(x))|| 2 . 

Assume that fi (x) > 0. Then, 

A 2 

— ||V u tf(x, u(x))|| 2 / 0 => A / 0 and V u i/(x, u(x)) / 0. 
Thus, 


A 

2 


A 

2 


and we have: 


fi( x ) 

|V u iT(x,u(x))|| 


u (x) = /3(x) 


V u #(x,u(x)) 


|V u iT(x, u(x) 
If /3(x) = (3 t , we have: 

V u Tf(x,u(x)) 


u(x) = p t 


| Vu-H'(x, u(x) 


[ 8 . 11 ] 


Equation [8.11] may be used into different ways: on the one hand, it defines 
a set of nonlinear equations which may be solved in order to determine u(x); 
on the other hand, it may be also used in order to update u^'K For instance, we 
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may choose a number of sub-iterations z max > 0 and set u( fc+1 ) = U ( fc +Mmax) ? 
where: 


u (fc+l,i) 


V u F(x, u ( fc+1 ' i - 1 )) 

V u F(x, u^+i’*- 1 )) || ’ 1 ~ 1} 


^max? 


u (fc+t,0) = u ( fc )_ 


[8.12] 


An alternative correction, involving a single sub-iteration, is furnished by: 

„(*+!) - „ 


= U X v ' - 


(b -B(x^)) V u g(x^,u(xW)) 
v * P[ v 1 1 V u i? (x( fe ) , U (x( fc ) ) ) 1 1 ’ 


z = 0, z r 


[8.13] 


These corrections may be interpreted as the use of multiplicative safety 
factors in the situations where v = x + diag(cr)u. Recalling that H (x. u) = 
G(x, T(x, u)), we have: 

V u H(x, u) = [V u T(x, u)]* V v G(x,T(x, u)) 

Thus, 

||V u iT(x, u)|| 2 = [V v G(x,T(x,u))] i V u T(x,u) [V u T(x,u)]* 
xV v G(x,T(x, u)) 


and we have: 
u(x) = J3(x) 


[V u T(x, u)f V v G(x,r(x, u)) 


[V v G(x,T(x, u))] f V u T(x, u) [V u T(x, u)f V v G(x,T(x, u)) 


Taking into account that T(x, u) = x + dicig(cr) u, we have V u T(x, u) = 
diag(cr) and 


Ui(x) = rr iT]i , rji =/3(x) 


g(x.T(x, u)) 


E ^flg(x.r(x,u 
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Let 7 j = ( Ti/xi be the coefficient of variation of vp. we have iq(x) = 
rji'JiXi and 17 (x) = (1 + Vili) x i • Thus, the factor 5* = 1 + rj i r ) i appeal's as 
a multiplicative factor to be applied to the deterministic variables x in order 
to ensure reliability. In such a situation. S% is called the OSF associated to 
variable i. 

Equation [ 8 . 1 1] may be exploited by the preceding algorithms in order to 
update u (k> . For instance, we may use the following algorithm: 


Algorithm 8.3. RBDO using OSF 

Require: kmax > 0, precmin > 0, j3 t > 0; 

Require: an admissible starting point (x^°). ri ! ' (ri ) ; 

1. fc = 0; 

2. determine u (/,:+l ) by updating vi 1 —* (for instance, use equation [8.12] or 
equation [8.13]); 

3. determine x( fc+1 ) = argmin { J(x) : H(x, u (fc+t) ) < q| (or other 

updating method) 

4. k = k + 1; 

5. Test the stopping conditions: ||x fc — x^' 1 1 < precmin or /;: > kmax. If 
the stopping conditions are not satisfied, go to 2. 

return x^'\ 
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